Current knowledge bases are full of facts, but they are surprisingly knowledge poor
Mo Data stashed this in Big Data Hype Cycle
The dramatic successes of big data have caused everyone to rush over to that side of the boat
The Association for Computing Machinery, a leading professional association in computer science, is this week holding its annual conference focused on what we’re now calling data science — though the ACM still clings to the label adopted when the yearly gatherings began in 1998, Knowledge Discovery and Data Mining. Of course, the field is booming, so the four-day conclave of talks, technical papers and human networking in New York has attracted an estimated 2,200 attendees, double last year’s headcount.
But in his keynote speech on Monday, Oren Etzioni, a prominent computer scientist and chief executive of the recently created Allen Institute for Artificial Intelligence, delivered a call to arms to the assembled data mavens. Don’t be overly influenced, Mr. Etzioni warned, by the “big data tidal wave,” with its emphasis on mining large data sets for correlations, inferences and predictions. The big data approach, he said during his talk and in an interview later, is brimming with short-term commercial opportunity, but he said scientists should set their sights further. “It might be fine if you want to target ads and generate product recommendations,” he said, “but it’s not common sense knowledge.”
In his presentation, Mr. Etzioni acknowledged the gains made possible by big data methods — identifying patterns and calculating statistical probabilities — in tasks like speech recognition and computer vision. But he then proceeded to underline the limits of the big data approach. He showed the results when one types in “apple fruit” into Google and the Knowledge Graph result, the extracted facts that Google presents as a graphic, is mainly a list of nutritional elements of an an apple. The results from other services that assemble knowledge bases, Bing and Wolfram Alpha, were similar.
But things that are readily understood by humans — that apples taste sweet and have a crunchy texture in the mouth when chewed, for example — are a challenge for the algorithms that generate digital databases.
“Current knowledge bases are full of facts,” Mr. Etzioni observed, “but they are surprisingly knowledge poor.”
The “big” in big data tends to get all the attention, Mr. Etzioni said, but thorny problems often reside in a seemingly simple sentence or two. He showed the sentence: “The large ball crashed right through the table because it was made of Styrofoam.” He asked, What was made of Styrofoam? The large ball? Or the table? The table, humans will invariably answer. But the question is a conundrum for a software program, Mr. Etzioni explained, because the correct answer involves both grammar and background knowledge. And the latter is something humans acquire through experience of the world.
Computers can’t experience the world as humans do. And Mr. Etzioni is skeptical of the progress that will be possible with “deep learning,” an artificial intelligence technique that uses the structure of the human brain as metaphorical inspiration for computer systems that process huge amounts of data.
Instead, at the Allen Institute, financed by Microsoft co-founder Paul Allen, Mr. Etzioni is leading a growing team of 30 researchers that is working on systems that move from data to knowledge to theories, and then can reason. The test, he said, is: “Does it combine things it knows to draw conclusions?” This is the step from correlation, probabilities and prediction to a computer system that can understand, in its way. That seems a steep climb of the semantic ladder of meaning. “We are trying to build these semantic models,” Mr. Etzioni noted.
Mr. Etzioni’s presentation was titled, “The Battle for the Future of Data Mining.” But other computer scientists see the big data approach and the quest for understanding described by Mr. Etzioni as less a battle than different yet complementary paths, heading in the same broad direction. The long-range promise, they say, is technology that becomes a layer of data-driven artificial intelligence that resides on top of both the physical and digital worlds, helping people to make faster and smarter decisions as a kind of clever software assistant.
Mr. Etzioni, other scientists say, makes a good point, but the current enthusiasm for big data methods is understandable. “The dramatic successes of big data have caused everyone to rush over to that side of the boat,” said Edward Lazowska, a professor at the University of Washington, who is on the board of the Allen Institute.
And the correlation and prediction of data science has certainly been good to Mr. Etzioni, whose talents include being a successful entrepreneur. He was a founder of Farecast, whose software predicted the best time to buy airline tickets. Microsoft bought Farecast in 2008. He was also a founder of Decide, a web site that sifted through historical price data and user recommendations to help consumers make buying decisions. Ebay purchased Decide in 2013.
The keynote speaker on Tuesday morning, Eric Horvitz, a computer scientist at Microsoft Research, emphasized all that can be done with big data tools. His talk was titled, “Data, Predictions, and Decisions in Support of People and Society.” In his presentation, Mr. Horvitz described several projects he and his team were working on. One involves using patient, treatment and historical data to predict which hospital patients are most at risk of being readmitted within 30 days, and suggest follow-up monitoring. Studies show that 20 percent of Medicare patients return to the hospital within 30 days at an estimated cost of $17.5 billion a year, in addition to the toll in human suffering.
Later, in a hallway conversation, a university computer scientist asked Mr. Horvitz about whether the software draws conclusions about the causes of hospital re-admissions. You can construct plausible explanations from the data, Mr. Horvitz replied. “But we don’t care,” he added. “Of course, we care in general. But it doesn’t matter to the effectiveness of the technology.”
In an interview, Mr. Horvitz, who is an academic adviser to the Allen Institute, agreed with Mr. Etzioni that the long-range goal is computer systems that can reason rather than merely recognize patterns and correlations and make predictions. But Mr. Horvitz chose a different emphasis. “I think we can have a huge impact in so many fields, in the shorter term, along the way to reasoning systems,” Mr. Horvitz said.