Google Translate and the Great AI Awakening
Adam Rifkin stashed this in Machine Learning
15 years of linguistics computer code was rewritten in 6 weeks thanks to machine learning.
This is an incredibly detailed article that deserves a careful read.
Google is surprisingly open about what it's doing.
I thought this part about 2/3rd through was very interesting:
but it turns out you can represent a language pretty well in a mere thousand or so dimensions — in other words, a universe in which each word is designated by a list of a thousand numbers.
and just after that:
If you took the thousand numbers that meant “king” and literally just subtracted the thousand numbers that meant “queen,” you got the same numerical result as if you subtracted the numbers for “woman” from the numbers for “man.”
“The reason we’re seeing extremely narrow systems right now is because they’re extremely useful,” says Ilya Sutskever, cofounder and research director of OpenAI. “Good translation is extremely useful. Good cancer screening is extremely useful. So that’s what people are going after.”
But he adds that although today’s systems look narrow, we “are already beginning to see the seed of generality.” The reason is that the underlying techniques are all just mild riffs on one concept. “These ideas are so combinable, it’s like clay. You mix and match them and they can all be made to work.”
By mixing and matching the narrow systems of today, we’ll land on something bigger and broader — and more recognizable as intelligent — tomorrow.
As they reported in November, the result they got was “reasonably good quality” — not staggering in its perfection, but not bad for a newbie. But when they then fed it a small set of Portuguese-to-Spanish sentence pairs, sort of an amuse bouche of data, the system suddenly became just as good as a dedicated GNMT Portuguese-to-Spanish model. And it worked for other bundles of languages, too. As the Google authors write in the paper, this “is the first time to our knowledge that a form of true transfer learning has been shown to work for machine translation.”
It’s easy to miss what makes this so unusual. This neural net had taught itself a rudimentary new skill using indirect information. It had hardly studied Portuguese-to-Spanish translation, and yet here it was, acing the job. Somewhere in the system’s guts, the authors seemed to see signs of a shared essence of words, a gist of meaning.
Google’s Pereira explains it this way: “The model has a common layer that has to translate from anything to anything. That common layer represents a lot of the meaning of the text, independent of language,” he says. “It’s something we’ve never seen before.”
Of course, this algorithm’s reasoning power is very limited. It doesn’t know that a penguin is a bird, or that Paris is in France. But it’s a sign of what’s to come: an emergening intelligence that can make cognitive leaps based on an incomplete set of examples. If deep learning hasn’t yet defeated you at a skill you care about, just wait. It will.
Source is Sandra Upson: