Overview
MedInjection-FR is a large-scale French biomedical instruction dataset designed to study how the provenance of supervision (native, synthetic, translated) affects instruction-tuning of LLMs. The corpus supports multiple-choice QA (single and multi-answer) and open-ended QA, and is released together with a family of fine-tuned baseline models.
- Native: 77,247
- Synthetic: 76,506
- Translated: 417,674
- Total: 571,436
Composition & Tasks
Task types
- MCQU (single-answer)
- MCQ (multiple-answer)
- OEQ (open-ended QA)
Counts (all components): OEQ 57,509, MCQ 59,592, MCQU 454,335.
Splits
| Component | Train | Validation | Test | Total |
|---|---|---|---|---|
| Native | 57,563 | 5,055 | 14,629 | 77,247 |
| Synthetic | 76,506 | — | — | 76,506 |
| Translated | 366,370 | 38,011 | 13,293 | 417,674 |
| Total | 500,439 | 43,066 | 27,931 | 571,436 |
Translated quality (WMT24 biomedical parallel)
| Model | BLEU | COMET |
|---|---|---|
| GPT-4o-mini | 51.01 | 0.8751 |
| Gemini 2.0 Flash | 53.72 | 0.8783 |
| WMT’24 best (ref.) | 53.54 | 0.8760 |
Higher is better. These scores indicate strong translation fidelity for the translated subset.
Download
Each component is published separately. Use the links below or load via the 🤗 Datasets library.
Native
French medical exams, resources, curated QA.
MedInjection-FR/Native
Synthetic
GPT-4o generated from clinical cases and abstracts.
MedInjection-FR/Synthetic
Translated
FR translations of established EN biomedical sets.
MedInjection-FR/Translated
Python (🤗 Datasets)
from datasets import load_dataset
ds = load_dataset("MedInjection-FR/Native") # or "Synthetic", "Translated"
print(ds)
Fine-tuned Models
We release seven instruction-tuned baselines (Qwen-4B-Instruct backbone, DoRA adapters), trained on 30k samples per configuration:
- QWEN-4B-NAT
- QWEN-4B-TRAD
- QWEN-4B-SYN
- QWEN-4B-NAT-TRAD
- QWEN-4B-NAT-SYN
- QWEN-4B-TRAD-SYN
- QWEN-4B-ALL
Quick inference (🤗 Transformers)
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "MedInjection-FR/QWEN-4B-NAT-TRAD"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = """Un professionnel de santé de 54 ans consulte un spécialiste des maladies infectieuses pour un suivi concernant un diagnostic récent d'hépatite C chronique.
Il s'est initialement présenté avec des symptômes tels que fatigue, malaise et enzymes hépatiques élevées et soupçonne d'avoir contracté l'infection à la suite
d'une piqûre d'aiguille il y a des années. Malgré le début du traitement, son titre viral reste élevé, ce qui incite le médecin à ajouter un nouveau médicament
qui inhibe la maturation virale en bloquant la synthèse des protéines. Quel est l'effet indésirable le plus probable de ce médicament ?
Choix de réponses :
(A) Uropathie cristalline obstructive
(B) Suppression de la moelle osseuse
(C) Insomnie et irritabilité
(D) Céphalées et photosensibilité
(E) Rêves lucides
(F) Hyperbilirubinémie
(G) Pancréatite
(H) Neuropathie périphérique
(I) Augmentation de la créatine kinase
(J) Alopécie"""
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=1
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
content = tokenizer.decode(output_ids, skip_special_tokens=True)
print("content:", content)
Evaluation at a glance
- MCQ/MCQU reported with Exact-Match; MCQU also uses Hamming score.
- OEQ uses BLEU/ROUGE/METEOR/BERTScore and an LLM-as-a-judge calibrated on human annotations (100 samples).
- Mixed training (especially NAT-TRAD) provides complementary gains over single-source setups.
Ethics, Intended Use & License
This dataset and the released models are for research use only. They are not a substitute for professional medical advice, diagnosis, or treatment.
- No PHI included; sources compiled from public datasets and teaching material.
- Evaluation includes human expert checks for a small sample; outputs may still contain errors.
- Please review the LICENSE before use. If unsure, contact the maintainers.
Citation
If you use MedInjection-FR or the models, please cite:
Contact
Questions, feedback, or requests: open an issue on the repo or email you@example.com.