Data scientist is to 2013 as ______ was to 2003. Fill in the blank.
Mo Data stashed this in Data Strategy
RTM Daily: Congratulations on your recent achievements. m6d's CEO Tom Phillips called data science-driven advertising a "reinvention" of the market. In your opinion, what does that mean?
Perlich: Advertising has historically been a combination of creativity and instinct, trial and error, and some comparatively small scale market research. Very little of it was validated – we were stuck in Wanamaker’s world where we did not know which half worked, and which half was wasted. The change I see is comparable with the industrial revolution – and I am trying to be entirely value free about this. Today we are notably better at measuring effectiveness at a large scale and have the optimization – what works best – done by machines in a matter of seconds and hours rather than months.
With the immense granularity of data about consumer preferences, activities, etc. at our disposal, we have the opportunity to reach even more targeted consumers, who are truly interested in specific products right now. I am not a generic 35-45 year old soccer mom and as such, should not be targeted with baking products. I am much more than that. I am somebody else at each phase of the day: in the morning when I get my son ready for school, during my lunch break and in the evening when I plan my next conference trip. This is the promise and potential of a reinvention of advertising: it will be much more precise and effective, much more subject to success metrics, it will cross all devices and be honestly willing to engage on topics like privacy considerations and advertising fraud.
RTMD: Can you define data scientist? I always imagine someone that's extra good at Excel and understanding what the algorithms are finding.
Perlich: Data scientists come in many flavors and it is hard to find a “one size fits all” description. Somebody who is extra good at Excel and understanding algorithms could be that, but there is the ‘big’ in big data that demands more computer science skills as well. The most relevant characteristics to me are (aside from the tooling that I will speak to in a moment) an extreme curiosity, deep skepticism and ‘technical’ creativity.
A data scientist has to be skeptical and like Popper, understand that data proves nothing. Nothing can be more misleading than data because we associate it with truth. I see a large part of my day as a mix of a problem solver and a data detective that pokes around to identify inconsistencies that need to be resolved before the magic can happen. The other vital contribution of a good data scientist is to understand what the problem really is and whether it can be solved with the data at hand.
Consider a trivial example: I might be asked “what is the average age of our cookies?” The reality is that this question is entirely meaningless. And unless I understand what you want to use my answer for, I refuse to give you one. The truth is that the vast majority of cookies live for 0 seconds. They never get written on the device (which is something I do not know). Of course I can just take the average of that number – it is probably around hours. But mind you that of all the cookies that I do see for a second time, the average is closer to 90 days. So as a data scientist, I know that summarizing highly skewed distributions into single numbers is meaningless. So I have to help you find out why you need this information to find the ‘correct’ answer.
This is a really simple case of what I generally mean by using our data and algorithmic understanding to help shape the solutions and the tasks. That is a craft, as well as a science, that goes well beyond being able to program Excel macros – data science lives in the intersection of understanding not just the results of the algorithms, but also the subtle caveats of their applicability and the problem that should be solved. I sometimes feel like a new breed of matchmaker.
Claudia Perlich is great - super-smart and very pragmatic