LoRA fine-tuning of NLLB-200 (1.3B) for Chinese-to-English translation on WMT19, reaching 24.9 BLEU.
-
Updated
May 31, 2026 - Python
LoRA fine-tuning of NLLB-200 (1.3B) for Chinese-to-English translation on WMT19, reaching 24.9 BLEU.
NLP coursework: PPO/RLHF demos (CartPole + TRL) and EN–VI machine translation with GPT/Transformer (from scratch) vs pretrained (GPT-2, MarianMT).
Companion repo for A 2×2 Controlled Ablation of Data Quality and Capacity in Transformer MT. Contains the QE-filtered fine-tuning pipeline (Section 7), CometKiwi-22 scoring/filtering scripts, and the full paper LaTeX source. Pretraining lives in Machine_translation
A machine translation evaluation pipeline prioritizing Meta's NLLB-200 model and the FLORES-200 dataset, featuring Kaggle-optimized data loading and UMAP embedding visualization.
Add a description, image, and links to the sacrebleu topic page so that developers can more easily learn about it.
To associate your repository with the sacrebleu topic, visit your repo's landing page and select "manage topics."