Decoding the Buzzwords of AI Linguistics

Let’s get to grips with the buzzwords at the foundation of AI linguistics. These core terms and concepts are crucial to understanding the crux of machine translation.

Artificial Intelligence (“AI”) – It’s been a buzzword for a while, and has found useful integration at varying degrees across most industries. It’s our insatiable fascination with being able to create machines that could potentially match our level of intelligence. A fascination often accompanied by a reasonable level of concern, namely the fear of being replaced by something that may surpass the processing power of our human brains.

Natural Language Processing (“NLP”) – This tool is essential in identifying language, parsing input, and enabling communication. AI requires NLP technology to analyse language structures to then store linguistic data in a meaningful manner. Consequently, it can automate reasoning, draw conclusions, and ultimately detect and extrapolate self-constructed models and structures via machine learning techniques.

Rule-Based Machine Translation (“RbMT”) – Automated and machine-driven translation systems have a long history. During the early stages of their development, the focus was on building rule-based machine translation processes. RbMT comprises a list of grammatical and structural rules relevant to each language, and combines that with vocabulary lists, dictionaries, and glossaries. For decades, global corporations spent astronomical amounts trying to translate documents this way. However, due to the many irregularities of languages, and the many situations where choosing the right word depends on context, it was often impossible to create even vaguely comprehensible content for a target audience.

Statistical Machine Translation (“SMT”) – Phrase-based statistical machine translation was the next step in the developmental focus on looking at a combination of words which constitute one lexical unit. A lexical unit — also called “phraseme”— refers to a meaning expressed by a combination of words where each may individually have a different meaning not directly contributing to the meaning of the phrase as a whole.

Neural Machine Translation (“NMT”) – These systems go one step further and look at complete sentences instead of chunks of words. They consider words not only as a combination of letters, but also take their relations to any other words in the sentence and even paragraphs into account. All that information and context then forms so-called “word vectors” which are stored in the NMT, optimised, and updated based on on-going statistical learnings and feedback. NMT results in much better readability and fluency, but it tends to struggle with specific terms. It often puts more weight on sentences as a whole, than specific words in cases where there is no perfect match.

AI has, without a doubt, already recorded astounding achievements: beating the world’s best chess player, and 20 years later defeating the world’s champion in Go – a board game long considered to be impossible to master by any machine. Attempts have been made recently to beat humans in translation competitions, but in that field machines have so far failed to surpass the human mind. It seems that the hard-to-define nuances and subtleties of naturally developing languages are still too difficult to understand and process for any AI systems.