Within five years, 50% of our queries will be through speech and images. Enabled by #deeplearning technology
Mo Data stashed this in Machine Learning
Researchers are working on a new version of an algorithm that will power better search, autonomous cars, smarter smartphones and the Internet of Things.
Deep-learning algorithms, which are based on loose simulations of the brain, have been used to advance technologies like speech recognition, natural language processing and robotic autonomy.
Now, researchers are working on the next generation of these algorithms, which are heavily used in machine learning and artificial intelligence and may become the foundation that critical technological advances are built on.
Dan Olds, an analyst with The Gabriel Consulting Group, said if we're to see "profound" technical advances -- like cars that drive for us, leaving us to nap or read, and cars that can take off and autonomously fly us to our destinations -- we will need better deep-learning algorithms.
"These autonomous cars rely on being able to "see" obstacles in the road and maneuver around them. The better the cars can differentiate between, say, a pedestrian and a sign post, the better they will be able to detect potential hazards," said Olds. "And it's not just about the future of our digital lives, but also our physical lives. What if we could trust systems to handle the task of flying or driving freight across the country or the world? What if we could sit in the back and sleep while being driven to work?"
Basically, even though most people haven't even heard of deep-learning algorithms, better ones could mean a future that includes smarter homes, and robots that care for parents and walk our dogs.
"This type of research is important in that it could yield better ways to wade through the infinitely expanding pool of data driven by the Internet of Things and mobility," said Patrick Moorhead, an analyst with Moor Insights & Strategy. "Deep learning is a critical part of the future of the digital world even though most people don't know anything about it."
Andrew Ng is an associate professor of Computer Science at Stanford University and the Chief Scientist of Baidu Inc., a Chinese Web services company and a major Chinese-language search engine. Working with scientists from Stanford and Baidu, Ng is working on building the next generation of deep-learning algorithms. He spoke at MIT Technology Review's EmTech conference in Cambridge this week about applying deep-learning technologies to search and future technologies.
He has the background to get it done. For a year and a half, Ng worked at Google, founding the company's deep-learning project, Google Brain.
Google, along with companies like Microsoft, Facebook and Baidu, are working to develop better deep-learning algorithms because they have so much data to deal with.
The beauty of these algorithms, Ng said in an interview, is that when you feed increasing amounts of data into traditional algorithms, they begin to stutter, slow and eventually flatten out. That's not the case with deep-learning algorithms. The more data you feed them, the better they function.
The human brain works so well because it is jam packed with a huge number of neurons that communicate through electrical impulses. Deep-learning algorithms, mimicking the brain, are based on simulated neural networks.
"As we build larger and larger simulations of the brain, these models are relatively efficient at absorbing huge amounts of data," Ng explained. "These are very high-capacity learning algorithms."
Work is progressing quickly.
About four years ago, the largest neural network, or set of deep-learning algorithms, had about 10 million connections. Ng noted that in early 2011, when he started the Google Brain project, that model had jumped to 1 billion connections. Last year, he worked with a team at Stanford to build a model with about 10 billion connections.
Part of Ng's work is to advance the algorithm, but he and his teammates also are working on using GPUs, or graphics processing units, instead of more traditional CPUs, or central processing units. The chips, designed for handling computer graphics, have turned out to be much better for building large neural networks because they're better at handling those kinds of calculations.
"We're building a new deep-learning platform with GPU hardware to help us scale better," said Ng. "My collaborators and I are the first ones to do this at scale. Other companies are starting to follow, but as far as I know, Baidu was the first company to build a large-scale GPU cluster for deep learning."
Making these algorithms even more high capacity should mean big advances in voice recognition and visual search. That's going to be critical, according to Ng.
As an increasing number of people from poor, and sometimes uneducated areas, come online, there will be a growing number of users who will speak their search query instead of typing. An increasing number also are expected to take pictures of what they're searching for, instead of typing in a description.
"Within five years, 50% of our queries will be through speech and images, so this is a technology we are investing heavily in," Ng said.
Improved speech recognition means that a driver might be able to speak aloud while driving and his phone, sitting on the passenger seat, will send a text to his friend, saying he'll be late.
"Even as the world moves to mobile, I think no one has figured out a good user interface for the mobile devices, which is why it's so slow to type on these tiny little keyboards on our smartphones," Ng said. "Speech recognition has gotten much better, but it doesn't work as well as we'd like. I'd love, when it gets better, to redesign the user interface on our cell phones around speech recognition."
Deep-learning algorithms also will be used with our smart appliances, smart cars and wearable technology -- stringing it all together in the much championed Internet of Things.
"I think remote controls will go away in your house," said Ng. "If you are at home and want to listen to a piece of music, instead of pulling out your cell phone and unlocking it and pressing a lot of buttons, you could just lie on your sofa and tell your Baidu device to play some Justin Timberlake. I hope that in the future my grandchildren will ask me, 'Was there really a time when your home appliances didn't understand what you said?' They'll be mystified at having five remote controls in your house."
In 2009, Kerosene and a Match predicted 20% of all searches online would be image-based with at least 1/4 of those being reverse image searches. Fast forward to 2014....
I predict at some point image searches will match textual ones as there are just as many visual ways to describe things as there text ways, and everyone who has ever tried to talk to a Home Depot employee knows, it's far easier to show them what you want on your cell phone than try to describe it and hope you choose the right name.
I think there's going to be a mix of ways of providing machines with instructions that they are able to 'understand' well enough to make sense of. Today the problem seems to be that machines need to be taught in some way, so supervised and semi-supervised learning techniques serve that purpose. But the problem in both speech and image recognition is that there are too many things to be found and too many ways of making the request. (Even IBM's Watson needs hooks on the data needing to be found that the questions can be matched with) So this is where deep learning comes into play, to provide unsupervised ways of training up machines at a scale that will cause these improved interfaces to take off.
On the retailer front, there is some value to the retailer in finding roughly what you are looking for rather than exactly what you are looking for. That gap represents their opportunity to sell you something in addition, something more profitable, something more expensive. So the adoption of these technologies might come quicker from finding chunks of text in a million legal documents, or finding an appropriate diagnosis in a stack of medical reports.
All good points, and mobile devices accelerate it all.