NLP: Translation issues

Google makes their translation system, based on machine learning techniques, available to everyone for browser web use for free, and for other commercial type uses for a minimal fee. Their free translation system is great for anyone learning a language. They use crowd sourcing to help improve the system, whereby anyone who sees something incorrect can help provide a better translation. But be aware that some phases do not translate perfectly. Idioms may not translate well. And the case, upper, lower, etc., may not match as the translations are based on machine learning matching.

Modern automatic language translations systems, such as Google Translate, use machine learning and statistical pattern matching rather than language grammar and spelling rules and specific knowledge of the languages being translated.

So, for example, the Greek word for "well", as in a "hole in the ground", might go through English to get translated into Russian as "well", as in "very good".

More: Automatic language translation errors

Google translate is very good at using statistical pattern matching from source and target texts. It has been shown to be clearly better than using rule-based systems. That is, it does not use specific grammar rules, spelling rules, etc., just statistical pattern matching (i.e., machine learning). However, translation issues remain. Greek to Russian translation

This translation my not work well. Google translate has some issues when it translates from, say, Russian to Greek and needs to go through, say, English. Greek to Russian translation

Consider translating Greek to Russian.

The modern Greek word "πηγάδι" (pee-GA-thee) ≈ "well" as in a "hole in the ground from which one gets water".
The modern Greek word "καλά" (ka-LA) ≈ "well" as in "very well".
The English word "well" can mean a "hole in the ground from which one gets water" or it can mean "very good".
The Russian word "хорошо" (ha-ra-sho) ≈ "well" as in "very good".
The Russian word "колодец" (ka-la-dets) ≈ "well" as in a "hole in the ground from which one gets water".

Whenever Google does not have enough source text translated into target text from which to get patterns to do the translation, it uses an intermediate language such as English. This is where some translation issues arise. Note: For some cases such as this, one would need to recognize the problem and then do a more specific search for "well" in English into Russian and then study the (sometimes) long list of alternatives in order to find the correct translation.

This requires one to realize that the initial answer provided is not correct!

The Google person in charge of translation, at a NLP (Natural Language Processing) and CL (Computational Linguistics) conference, said that there was not enough text to do the machine translation satisfactorily.

Enough people asked him, including myself, that a few weeks later it appeared on Google Translate.

They use crowd sourcing to help improve the system, whereby anyone who sees something incorrect can help provide a better translation. But be aware that some phases do not translate perfectly. Idioms may not translate well. And the case, upper, lower, etc., may not match as the translations are based on machine learning matching.

A related area of topic modeling, but requiring more natural language semantic analysis, is that of sentiment analysis - trying to determine if comments are positive, negative, or neutral - or some other semantic grouping. I started to look at this in terms of 30,000+ comments (most in German) from a company with a presence in Germany, but then the company decided they wanted me to move on to something they considered more important. Note that sentiment analysis is related to but very different than topic modeling. Topic modeling is an unsupervised machine learning method that groups together topics of words in documents without direction and without knowing what the words or topics actually mean. Sentiment analysis machine learning approach whereby the computer needs to have some idea of whether a word or group of words refer to something positive or negative. There are usually only two or three decisions for a sentiment analysis decision, "yes" or "no" or "maybe", "positive" or "negative" or "neutral", etc. What does "awfully good" mean?