Informática e IAInvestigación original
Attention sparsity for low-resource neural machine translation
Kyoto University
Hiroshi Tanaka
Resumen
Low-resource translation suffers from overfitting in standard transformer architectures. We introduce a structured attention-sparsity regularizer that constrains the effective context window during training and relaxes it at inference. Across eight low-resource language pairs, the method improves BLEU by an average of 2.4 points and reduces hallucination rates measured by a fact-consistency metric. Ablations isolate the contribution of head-level sparsity.
Palabras clave
machine translationtransformersattentionlow-resourceNLP
Uso de IA en la elaboración
Transformer models are the object of study. No generative AI was used in writing the manuscript.
