Translation apps are hugely popular, but they could be more accurate. Swiss researchers have found an approach that reduces errors.
Their project, called MODERN: Modeling discourse entities and relations for coherent machine translationexternal link, aims to help the machines gain more context from the text as a whole rather than just piecing together individual sentences.
“State-of-the-art machine translation systems, especially statistical but also rule-based ones, operate in a sentence-by-sentence mode, and do not propagate information through the series of sentences that constitute texts,” explain researchers from Martigny’s Idiap Research Instituteexternal link in canton Valais on their website.
Here’s an example of why that can be problematic.
“My aunt has bought a great sedan. But she is not so beautiful,” reads a sample from Google Translate – an interpretation that could cause a family feud. The original German version included the pronoun for sedan, which happens to be feminine. And because the word “beautiful” is more likely to describe humans, the automated service produces “she” rather than “it”.
“Machine translation tools don’t really understand the meaning of the texts they process,” says MODERN project leader Andrei Popescu-Belis, head of the Natural Language Processing Group in Martigny at the Idiap Research Instituteexternal link. For language pairs such as French-English or Spanish-English, pronouns mislead the machine translation tools in about half of all cases.
MODERN focuses on the interplay between expressions and the relationships between sentences, “which are often conveyed by explicit connectives that are notoriously difficult to translate”. The project involves language specialists from the universities of Geneva and Utrecht (Netherlands) as well as researchers from the University of Zurich’s Institute of Computational Linguisticsexternal link.
The results are promising so far, says Popescu-Belis. “By forcing the tool to take information from the previously translated sentence into account, we can reduce the error rate to 30%.”
The MODERN project focuses on four languages – English, French, German and Dutch – and there are two case study domains: texts from multilingual Alpine Club yearbooks, and texts on European environmental legislation and debates.
On Monday, Popescu-Belis and his colleagues were in Valencia, Spain, to present their study at a conference organised by the Association for Computational Linguistics. MODERN is supported by the Swiss National Science Foundationexternal link.