Informática e IAInvestigación original

Attention sparsity for low-resource neural machine translation

PorSara Nakamura Autor verificadoORCID 0000-0003-2210-8841

Kyoto University

Hiroshi Tanaka

Resumen

Low-resource translation suffers from overfitting in standard transformer architectures. We introduce a structured attention-sparsity regularizer that constrains the effective context window during training and relaxes it at inference. Across eight low-resource language pairs, the method improves BLEU by an average of 2.4 points and reduces hallucination rates measured by a fact-consistency metric. Ablations isolate the contribution of head-level sparsity.

Palabras clave

machine translationtransformersattentionlow-resourceNLP

Uso de IA en la elaboración

Transformer models are the object of study. No generative AI was used in writing the manuscript.