Over the past couple of years, you couldn’t attend a conference about translation services without getting an earful about artificial intelligence and machine translation, and how this technology is on the verge of overturning traditional human-led language services. I have attended events recently, where the term “human-assisted translation (HAT)”, rather than the traditionally known industry term “computer-assisted translation (CAT)” was thrown around, suggesting that humans will enter a more assistive role and machines will take the lead in future translation work. Interestingly enough, technological disruption does not seem to be a concern in interpretation services. There are reasons behind this, and this article will explore technology disruptions in both translation and interpretation.
Machine Translation, like Bitcoin, was only popularized about two years ago, but the history of it can be traced all the way back to a Soviet scientist in 1933 whom, using nothing but cards, a typewriter, and a film camera, essentially created a crude automated dictionary. But the “arms race” for machine translation did not start until IBM, at the dawn of the Cold War in 1954, used a computer to translate 60 carefully curated Russian sentences into English. Driven by the Cold War and the countries’ desires to spy on one another, scientists experimented with machine translation, mainly between the English and Russian languages. Rule-based machine translation, which attempted to use grammatical rules and dictionary definitions to process translations, was attempted over the next 30 years. The concept was good, but it proved nearly impossible to program every single nuanced rule of the languages involved, not to mention the non-logical factors that influenced the evolution of the languages such as culture, history, superstition, exterior influences, and sometimes plain absurdity. Starting in the 1980’s, Japan’s work in Example-based Machine Translation (EBMT) changed the game. It turns out you didn’t need to feed computers all the rules of a language, just sample translations, and allow the machine to naturally develop associations. By analyzing a large amount of translated data, the most frequently used translation was deemed best. EBMT evolved into Statistical Machine Translation (SMT), and computers were able to translate languages without learning any of the particular linguistic rules. Up until the year 2016, SMT dominated the machine translation industry. However, accuracy was thrown off due to many factors. One example is the translations of popular film and song titles, which were often converted into titles that meant something completely different in the target text. But next time the title of the film came up in a different context, an inaccurate translation may be applied. For example, the Chinese film titles for the “Mission Impossible” film series is “Spies among Spies”. So under SMT, the next time a string of source text contains “an impossible mission”, the translation “spies among spies” could theoretically be applied, throwing off the whole text. This is why services like Google Translate, up until year 2016, would at times yield surprisingly accurate and usable results, and other times utter gibberish.
In 2016, Google changed the game by announcing neuro-machine translation (NMT). An article in the New York Times two years back about Google’s use of artificial intelligence to transform the Google Translate engine was dramatically named “The Great A.I. Awakening”. The general principles behind NMT is that the meaning carried by the source text would be converted to a set of specifications. For example, the word “king” would contain specifications such as male, royalty, dominance, etc. These specifications would then be converted to a target text that features the same specifications. Some interesting things were discovered, such as the specifications of the word “king”, minus the specifications of the word “male”, plus the specifications of the word “female”, would yield results extremely similar to the specifications of the word “queen”. Of course, concepts were not limited to just words, relationships between adjacent words played an important factor. Without getting into the technical details, let’s just say that Deep Learning helped to resolve this issue. Due to its nature of analyzing meaning associations, NMT doesn’t handle single words and short phrases well, and Google, along other engines, still defer to SMT when translating single words or short phrase translations.
So why is there so much research funding injected into machine translation? Aside from military applications such as intelligence and spying, the obvious answer is the huge potential B2C market. A translation engine that is practical without human interference can be marketed to every human being in the world that travels across cultural and language borders for business or pleasure. Language learning would be unnecessary. Native tongues can flourish. A “universal translator” app would be on our mobile phones, and everyone would use it as frequently as their calendars, not to mention the gargantuan amount of consumer data that would be collected via these texts people enter for translation, which can all be distilled for marketing information. Tech giants are investing heavily into NMT, and they’re also hungry for data. Current deep learning processes still require the processing of bilingual comparative text. This is why companies are charging a measly $20-$30 for a million words for NMT, because they want to gather as much data as possible.
It is obvious that machine translation technology is not being developed to help translators do their work. In fact, the undertone of the discussions involving A.I. and NMT among translators is always one of threat and disruption. Yet, some also see opportunity, the ability of being more productive while aided by machine translation, and changing their service model from translation to post-editing. LSP businesses can cut cost by offering post-editing, instead of translation-editing-proofreading (TEP) services that normally require two human translators. The American Translator’s Association recently released a draft position paper on post-editing, stipulating that machine translation is suitable for specialized and reused texts, and that a human editor performing post-editing is still necessary to produce a usable end product. Indeed, NMT is not perfect and produces results that are at times facetious and at other times catastrophic. The argument however, is that humans can also make catastrophic mistakes. NMT error levels are approaching that of humans, and this is why NMT results are “usable”. Highly controlled source text, with limits placed on vocabulary and grammar, such as technical content, can produce usable machine-translated results. Otherwise, a post-editor is needed to edit the machine-translated text to yield a coherent target text. Indeed, some believe that a perfect machine-translation engine will never actualize, unless machines truly become intelligent A.I.s, and by that time, we’d have bigger problems to worry about (maybe Skynet). Until then, professional human translators will always be needed, even if it is for “human-assisted translation”.
Let’s move to interpretation. It is obvious that there isn’t as much technological disruption in the field of interpretation, because, well, it would be as simple as a text-to-speech engine that reads out the machine translation. I’m not sure how many people are aware of this, but at the 2018 session of the Boao Forum, which is the equivalent of Davos in Asia, an ambitious Chinese company named Tencent attempted to provide A.I.-powered real time translation, instead of having the work done by simultaneous interpreters. Well, when certain speakers broke grammatical rules of their native languages (this happens quite often, especially in Mandarin), or misused certain terminology, it yielded some pretty hilarious results, such as translating “Road and Belt” (supposed to be Belt and Road) as “a path and a conveyer belt”, and stating “tigers will be an important technology of the future”. Tencent had demonstrated the limits of current A.I. technology. It cannot cope with human errors, or humans that do not articulate their words with perfect grammar and vocabulary. A layer of human processing is still required to yield accurate meaning from the source speech. However, as a conference interpreter myself, I find this technology very exciting. When I have to perform simultaneous interpretation from my A language into my weaker B language, I usually have a machine translation engine open in front of me, and I use it to look up difficult-to-interpret terminology in real time. This still requires me to type in the text into the engine while I am simultaneous interpreting. If the technology exists for a software to parse the speaker’s words into text, and perform a machine translation of the detected text, I can use the results – however inaccurate – as a quick reference for interpreting particular words and expressions, making a human judgement on the accuracy of the translations and deciding whether to apply or dismiss the resulting display. This works particularly well for technical language, such as chemical names or medical terms. Again, I wouldn’t rely on the text to render my interpretation, but only for certain words. Ideally, I envision the heads up display to be on a pair of eyeglasses, that I can wear when I’m working as an interpreter, while the audio feed from the conference would go into my ear, and concurrently into the speech-to-text software engine being used by my eyeglasses.
There are many language professionals who are averse to technology, and they claim that either machine-translation to post-editing, or speech-to-text to machine-translation, only distract them from doing their work. Indeed, everyone’s tech-comfort level is different, and these methods aren’t for everyone. One thing is for sure though, our industry is certainly being disrupted by technology; and Artificial intelligence, Neuro-machine translation, Deep Learning, are all here to stay. An equilibrium will eventually be established – whether it’s more CAT or more HAT – but technology is and will continue to be an integral and deeply entrenched component in how language solutions are offered. The future is here, and MCIS is ready to embrace it!