Google Research Releases Wikilinks Corpus With 40M Mentions And 3M Entities | TechCrunch
Rohit Khare stashed this in Hacking
Stashed in: Google!
For Google, of course, disambiguation is something that is a core feature of the Knowledge Graph project, which allows you to tell Google whether you are looking for links related to the planet, car or chemical element when you search for ‘mercury,’ for example. It takes a large corpus like this one and the ability to understand what each web page is really about to make this happen.
Sounds like some beautiful work they did here.