COVID Stories: COVID-19 and the Limits of Machine Translation

By: Sheekha Sanghvi

By: Sara Parker-Toulson, RFP Coordinator

Public displays of information on COVID-19 seem to be everywhere now, reminding us to wash our hands, keep six feet between one another, and to wear our masks. Information about how COVID-19 is quickly spread, about signs and symptoms we need to watch out for, and the latest data on infection rates and mortality is now widely available in a variety of formats. But what if you don’t speak one of the commonly translated languages in which this information is conveyed? There are more than 7,000 languages on the planet. How do we make sure every one of us has the information we need to stay safe in this pandemic?

As a recent article on Wired points out, COVID-19 poses the largest translation challenge the world has ever faced. Billions of people need free access to critical public health information that may not be available in the language they speak. Many organizations – including MCIS – have made crucial information about mitigating the risk of the pandemic available in many languages. The Endangered Languages Project is another example, and one that has so far coordinated a multilingual COVID-19 response in over 500 languages, including over 400 videos in more than 150 languages.

Still, there are thousands of languages in the world that are at risk of being left out of the response. How do we win the race against pandemic time?

One of the greatest tools we have at our disposal in order to connect people and COVID-19 information is machine translation. Machine translation is a sub-field of computational linguistics in which software, rather than actual people, is used to translate one language into another.

Although machine translation seems very new, it is decades old. In 1954, a Georgetown University and IBM research team demonstrated a computer that could translate 250 words from Russian into English. In the 1970s, the Traduction automatique à l’Université de Montréal (TAUM) research group developed one of the first functioning machine translation systems. Called METEO, the system was capable of translating 80,000 words and translated weather information from English into French. It was in use from 1981 to 2000.

When machine translation was first devised, it was rule-based, while traditional machine translation methods generally involved rules for transforming text from the source language to the target language. The rules, developed by linguists, operate at the lexical, syntactic, or semantic level. This focus on rules gives the name to this area of study: Rule-based Machine Translation, or RBMT.

According to MT specialists:

“Rule-based machine translation relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. The software parses text and creates a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language.”

There are limits to this model of machine translation though. For one, using strict rules to create translations does not take into account that words have multiple meanings, or that some rules have exceptions that cannot properly be accounted for using RBMT.

In part, these limits have been addressed through new developments in the machine translation field. After RBMT, statistical machine translation (SMT) came about, which used vast amounts of language data to determine what the most likely translation for a given segment would be. More recently, neural machine translation (NMT) – an MT system that learns and develops on its own, using sample language data – has been making rapid progress and taking over the field of MT. NMT and hybrid forms of the three machine translation methods have largely replaced classical rule-based and statistical models with models that learn from examples.

These new developments in machine translation have resulted in the development of hundreds of new tools for automated translation, many of which are available free.

One of the most popular tools for machine translation available to anyone with an internet connection is Google Translate. By simply inputting your source text in an online text box, you can immediately generate a target text in a number of popular languages. It can also translate entire webpages instantly with the tool.

Microsoft Translator is another free machine translation tool that is available in many languages. It is unique in that it translates across platforms and programs, and can be used to translate one-on-one chat conversations, as well as larger blocks of text.

Amazon Translate is a neural-based translation software that claims to be able to quickly translate large volumes of text from one language to another. It isn’t free, however!

Hundreds of other machine translation tools exist all over the world. Most recently, for example, new developments in camera technology allow us to use our phones to translate visual text. In the context of COVID-19, this can mean that desperately needed medical and health information can be quickly and easily translated from one language to another with the snap of a photo.

Although machine translation is evolving rapidly and creating exciting new opportunities to connect people through languages, it also has limitations – some of which may compromise its ability to fight COVID-19.

For one, machine translation is limited in its ability to translate literary or nuanced texts. Google and Microsoft tools may understand the linguistic rules and relationships between individual words and even sentences, but they can’t understand subtle nuances between synonyms, or how a metaphor is constructed, or how humor operates. This means that while simple instructional or technical texts may be easily translated with Microsoft or Google, humorous blog posts or irony-laden texts have little chance of surviving a contemporary translation machine.

Sometimes attempting to translate even the simplest of texts using machine translation can result in some very unusual target text and some unintended consequences. To prove this, there are countless websites and blogs that collect machine translation “fails”.

Other times, machine translation can cause more serious problems. In one case, a Mexican man named Omar Cruz-Zamora was pulled over by police in the State of Kansas. Upon inspection of his vehicle, he was found to be carrying a large stash of illegal narcotics – meth and cocaine – and was subsequently arrested. However, Cruz-Zamora did not speak English well, and Google Translate was used in order to communicate with the man. According to the United States Constitution’s Bill of Rights, police must obtain consent to search a vehicle. However, when Google Translate attempted to translate the officer’s question“¿Puedo buscar el auto?” from Spanish to English, the results were a “literal but nonsensical” translation: “can I find the car?”, rather than “can I search the car.” When the case arrived in court, it was found that the machine translation at work was not enough to establish legal consent between the two parties. Consent was invalidated, the charges supressed, and Cruz-Zamora was let off without charge.

Machines are unable to interpret texts. They cannot derive meaning. This means that poorly written text has little chance of being translated correctly. If a given text is of poor quality, if it is ambiguous in any way, or even if it suffers from spelling mistakes, a machine translation engine will not be able to translate it well. This is why even machine translations require a human counterpart who can edit the target text into something useful and readable. Many translation organizations need to combine a fleet of human translators with machine translation tools in order to achieve high quality translations.

As cognitive science and comparative literature professor, Douglas Hofstadter puts it:

“The practical utility of Google Translate and similar technologies is undeniable, and probably a good thing overall, but there is still something deeply lacking in the approach, which is conveyed by a single word: understanding. Machine translation has never focused on understanding language. Instead, the field has always tried to “decode”—to get away with not worrying about what understanding and meaning are…It’s all about ultra-rapid processing of pieces of text, not about thinking or imagining or remembering or understanding. It doesn’t even know that words stand for things. Let me hasten to say that a computer program certainly could, in principle, know what language is for, and could have ideas and memories and experiences, and could put them to use, but that’s not what Google Translate was designed to do. Such an ambition wasn’t even on its designers’ radar screens.”

Although machine translation can do well in technical contexts, it also poses limits here as well. When the text being translated is too technical, or when the material being translated requires extensive subject-matter expertise in order to make sense of it, the translation machine cannot do much about it. Similarly, when legal texts are translated, the machine cannot account for the differences in legal systems between countries or the differences in legal terms from nation to nation.

So what does this mean in health-related contexts? One example of the limits of machine translation comes from that recent article in Wired magazine. In it, the author chronicles the story of the Wuqu’Kawoq Maya Health Alliance, a non-profit group that has been helping provide medical support to people who speak indigenous Mayan languages. One of the group’s clients was diagnosed with diabetes, but there was no equivalent term for the disease in the client’s language. In consultation with medical professionals, a new term was developed to describe the concept of the illness: kab’kïk’el, literally “sweet blood.” With the new term, the client was able to grasp the nature of her condition and the basic lifestyle changes she needed to implement to address her illness. Even the most complex and contemporary machine translation engine would never have been able to devise this solution.

Arguably the most important limit contemporary machine translation has is in the languages in which it currently works. And unfortunately, machine translation software is only available in more commonly spoken languages. Microsoft Translator only supports 70 languages right now. Amazon Translate supports a mere 55.

Even more ubiquitous tools are limited. Google Translate, for example, is only available in 108 languages. Although they are working on adding more, adding new languages to the tool is a long and arduous process. In the last four years, Google has only added five new languages to its repertoire.

According to Google, the ability to add new languages to its roster is limited by the availability of online text in less common languages:

“Google Translate learns from existing translations found on the web, and when languages don’t have an abundance of web content, it’s been difficult for our system to support them effectively,” a company spokesperson said in a statement. “However, due to recent advances in our machine learning technology, and active involvement from our Google Translate Community members, we’ve been able to add support for [more] languages.”

Limited language support for uncommon languages poses a dire threat to those left out of the information circle. According to Translators Without Borders, a lack of language rights not only threatens people during a global pandemic, “[i]t also contributes to a global communication power imbalance. Governments, humanitarian organizations, and other information gatekeepers typically do not engage in a global dialogue, they only share information from the top down. Sometimes they do not have the tools to publish and distribute information in marginalized languages. And people who speak these languages are not able to proactively share their needs, concerns, or ideas.”

Translators Without Borders is actively trying to close the gap between machine translation and the people it could help. Their initiative first focuses on gathering text and speech data that can make it easier to automate marginalized languages. According to their website, “using this data, we can build advanced technology-driven solutions for both text- and voice-based communication”. Their “cross-industry effort brings together technologists, native speaker communities, humanitarian organizations, content creators, and donors to fund investment in language data and technology, making it useful and accessible for all”. Over the next five years, the organization’s goal is to bring 10 marginalized languages online.

Machine translation poses real solutions that can diminish borders between people and the COVID-19-related information and services they need. It has the potential to connect millions who would otherwise be left out of the COVID-19 information bubble. However, these solutions also come with limits that illustrate the necessity of human interpretation, of the sense and intelligence that machines can only dream of.

To read more in our COVID Stories Series, please visit: https://www.mcislanguages.com/mcis-blog/


FIND OUT if you are eligible for free services!

JOIN the Language Advocacy Day (#LAD21) coalition!

Working with people with language barriers? Share your most pressing challenges! RESPOND to the survey!

FIND multilingual COVID-19 information fact sheets!

COLLABORATE! Are you interested in language rights, advocacy, partnerships, and you have an idea to share – contact us!