The Unofficial Google Data Science Blog
Mo Data stashed this in Data Science
Sean Gerrish, Google NewsAmir Najmi, Google Ads QualityDiane Tang, Google Research
Despite Google’s technical achievements with big data, it may come as a surprise that there is no official Google blog for data science. True, Google Research puts out many academic papers and has a blog describing matters of interest to researchers. But what has been missing to date is a conversation about the nuts-and-bolts, the day-to-day of large scale analytical systems Google builds to serve its users.We’d like to change that. We are a group of individuals from across several engineering teams at Google whose job it is to design and build the analytics used in Google’s products and services. While most of us have PhDs in statistics, machine learning or a related field, ours is not a blog aimed at academia. We’ll provide academic references if necessary, but we mean for this to be a practitioners’ blog. At the same time, the problems we face are often complex enough to require highly technical solutions in statistics and computation. Thus many of our posts might not be suited to the casual business analyst. Our intended audience is other data scientists in industry, as well as students who wish to pursue such a career.Of course, this somewhat begs the question: what is this field we are calling “data science”? We don’t presume to define its contours and, besides, othersmay possess greater wit. All we know is that there is an emerging discipline at the nexus of statistics, machine learning and computation which seeks to derive inference from data too big to fit on a single computer (aka “big data”). We know because this is the solution space of most business problems we are tasked to solve in our daily professional lives.This is not an official Google blog to communicate with users about Google's products and policies. This blog does not speak for Google and will not articulate Google's position on anything. Rather, our goal here is to contribute as data professionals to the ongoing discourse around the nascent field we might as well call “data science”. We’d like to do this by communicating what we’ve learned, what we’ve failed to learn and how we are searching for answers. Our authentic experiences, be they good, bad, or ugly.To give you a sense of the kind of material you can expect from us, here is a partial list:experiment design for large, sparse datastreaming algorithms for statistical inferencemachine learning models we have found usefulanalysis methodologies we've invented/reinvented/repurposed that proved particularly effective for uswhen standard statistical methods work even better for big datawhen standard statistical methods fail and need to be tweakedpractices which we have found to make data scientists more effectiveour experience towards building successful data science teamsthe business context within which all our technical problems existOn that last point: we strongly believe that the analytical problems of data science must be situated in actual business decisions. Over time, we hope to provide some insight into our business context as it connects with our methodologies, culture and way of thinking.Ideally, we’d like for this to be a conversation. We encourage you to tell us what you found particularly useful or interesting, or how you could improve upon an approach we describe. We’re in this together, this brave new world of data science.
Stashed in: Big Data!