Marco Trombetti


You can increase your chances of success by identifying, surfing and anticipating growing macro trends. The easy way of identifying such trends is by living in the future. There is no time machine yet: to experiment the future, you must live in a context that most other people would consider to be the future. A research lab, an innovative company and a bunch of friends with a strong common interest in technology are great examples.

Understanding macro trends is important, what I can share here a short personal experience on this topic.

Artificial intelligence is fascinating and scary. Human language and translation in particular are perhaps the most difficult challenges that machines face. Natural language is a very compressed channel of information that is densely packed with meaning, and it requires contextual information beyond the words themselves in order to be understood.

Language is the greatest challenge that machines face because it is the most human thing there is.

Because of this, automatic translation systems are progressing slowly; nevertheless, they are undeniably progressing.

At Translated, the translation service I co-founded, we have applied artificial intelligence over the past 17 years to help professional translators translate better and faster. We have tried to create a symbiosis between man and machine. We have done this in many ways, but one very important approach has been to provide translators with suggestions (pre-translations) for every sentence. We have developed a translation tool for professional linguists that combines all the professionally translated material available on the web with AI that can predict sentences never seen before. This is the basis for our open-source product called MateCat.

Others have tried more disruptive approaches, replacing professional translators with end-to-end translation technology. The most striking example is Google Translate.

By helping professional translators, we have been able to take advantage of a unique opportunity, namely that of measuring the progress of the AI over a period of many years.

We have measured how much professional translators correct the suggestions provided by the AI, and we have done so day by day, month by month and year by year.

Back in 2003, with the valuable financial support of the European Commission, we undertook a research project in which we translated several hundred thousand words, and we found that the overall correction rate (post-editing effort1) for English > Italian and English > French was around 43%. In 2015, the correction rate was 27% for the same language combinations. The second time around we used a sample of 50 million words translated in MateCat. Thanks to the application of both neural machine translation and MMT, a system which is capable of adapting to the user, we estimate that we will reach a correction rate of between 22% and 26% in 2018.

This improvement has been unstoppable and constant, with just a few small delays and surges due to one technology reaching its maximum potential and another being introduced. There have been two major changes: statistical translation, which entered service in 2006, and deep learning, which was introduced at the end of 2016.

If we continue at this pace, when will we get to the point at which it is no longer necessary to correct the machine translation?

If we just look at the figures, it seems like this could happen between 2030 and 2035.

However, there is another interesting fact that we often forget: humans are not perfect.

When we analysed 20 million words in word-for-word translation suggestions handled by human linguists (called 100% matches), we observed that suggestions from other humans have an average correction rate of 11% rather than 0%. This is because to errare humanum est, and also because each of us has a unique style that we want to promote. When we talk about the singularity, we need to make sure we define the benchmark. Is it absolute perfection? The best translator in the world? Or just the average professional translator?

If we are satisfied with a machine that translates better than the average professional translator, 2025 may be a more plausible date for when we will reach an 11% correction rate in these language combinations. To my mind, that’s frighteningly close.

I’ve been wondering whether I should sell Translated now, since the market for professional translations will shrink significantly, or whether I should try to ride out the change in order to seize an even greater opportunity. In the end, people will probably need more translations, not fewer. I feel a bit like Kodak during the transition from film to digital.

The fact that I’m aware of it is already something, and because of that I have already decided that we’ll ride it out.

It is very likely that artificial intelligence will play a key role in every sector in the future. While language is the most difficult thing for machines to tackle, it is possible that the disruption will happen even earlier in many other areas, and this represents an excellent source of startup ideas.

1Post-Editing Effort: in order to measure the correction rate, we use an algorithm similar to the Fuzzy Match found elsewhere in the translation industry. A Word-Level Edit Distance with adjustments to take into account punctuation, case and formatting errors.