Big Data Needs Thick Data Latest fad, an insult or nothing that new?

Mo Data stashed this in Big Data

Thick Data: is this a new fad or an insult that you hurl in frustration when your big data isn't giving you the insights that you expected? I came across the term in a recent WSJ journal article and being unfamiliar with the concept, I thought I'd dig a little deeper. A simple Google search reveals any number of recent articles and webcasts about the marriage of thick data and big data, but what does it actually mean?

Big data is basically a quantitative approach that gathers large data sets from which companies try to draw insights. It deals with facts and the accumulation of data points; how many people visit a website; what pages they visit; what browser they use; how long they spend on each page; how do they navigate around a website; these are just some metrics that can be gathered and analysed under the guise of 'big data'. The term 'thick data' is an attempt to bring the emotional or human context to big data sets; ultimately to try and gain deeper insights: for example, why do users visit a particular page on a website? Why do more people visit that page at 9am on a Monday morning? Essentially, thick data suggests a qualitative approach; whether interviewing a selection of visitors, or using a focus group, or observational approaches using ethnography. Thick data is about trying to determine the context for the factual metrics and trends collected through big data.

The term 'thick data' has been around for quite some time. A cursory search of academic databases finds references dating back to the 1980's and no doubt there's many more. Qualitative analysis is a well-established research methodology and maybe this recent connection with big data is just a way of gently reminding us that whilst technology enables the collection of big data sets, we should not loose sight of the context when analysing. But presumably, companies wouldn't blindly make assumptions based purely on analytics? That would crazy! For example, if insights from big data were to shape the future evolution of a product or service, surely an organisation would attempt to validate those insights first, using a range of qualitative methods such as focus groups or interviews with lead users?

But the importance of context isn't just exclusive to big data; it's usually important for any type of data. For example, I'm a runner; I'm also a technologist; as such I have collected data from virtually every workout that I've ever done over the last three years: duration, heart rate, pace, evaluation are just some of the numerous data points that I have amassed. But what does all this data actually tell me? Of course I can get factual metrics that superficially tell me something and produce fancy graphs; but are they giving me deep and meaningful insights that will help me become a better runner? Without context, this data has limited value: how did I feel after a run? What did I eat before the run? After the run? Did this affect my performance or recovery? Was I stressed and under performing due to a lack of sleep? Or deadlines at work? ...there are so many different factors that can influence the analysis of this data. Now imagine scaling this up and collecting the fitness data from millions of runners: how would you establish a context and determine what's important with that big data set?

So whether you're collecting analytics from twitter, channel zap measurements from a set top box or networking metrics from your cable modem: what is all this data actually telling you? If you see a trend over time, is that because of an impact due to software changes or Is it due to some external influence? Is an analysis of the outliers more useful than the main trend? Can you actually draw firm conclusions without deeper analysis? Or in other words, without thick data; without establishing the context? Ultimately it's easy for organisations to get caught up on hype surrounding big data. Recent technological developments have made big data collection affordable and available to more organisations than ever before. But collecting big data is relatively easy; producing factual reports of metrics and trends is also relatively easy. The hard part however, is realising the limitations; recognising the need for context; and accepting that big data will ultimately lead to even more questions, which, when answered will lead to greater knowledge, insights and by extension, value.

One final illustration courtesy of Nassim Taleb, author of The Black Swan, (http://www.fooledbyrandomness.com). Consider a turkey; with each passing day it gets fed rather than slaughtered; each day, that turkey analyses the available data and calculates the probability of being slaughtered as decreasing. After 100 days, the probability of slaughter is now pretty low and the turkey is feeling safe and secure; a day later, in the run-up to Christmas, the turkey is slaughtered. So, collecting big data, and calculating trends is a great first step. But establishing the context, considering the influencing factors and following up with more questions and research will ultimately lead to greater insights. If we want popularise that by calling it 'thick data' then I guess that's ok - but it's nothing new. Still, beats the alternative of thinking like a turkey!

<a rel="nofollow" target="_blank" href="http://stephenclements.ie/node/21">http://stephenclements.ie/node/21</a>

Thick Data: is this a new fad or an insult that you hurl in frustration when your big data isn't giving you the insights that you expected? I came across the term in a recent <a rel="nofollow" target="_blank" href="http://online.wsj.com/news/articles/SB10001424052702304256404579449254114659882">WSJ journal article</a> and being unfamiliar with the concept, I thought I'd dig a little deeper. A simple Google search reveals any number of recent articles and webcasts about the marriage of thick data and big data, but what does it actually mean?

Big data is basically a quantitative approach that gathers large data sets from which companies try to draw insights. It deals with facts and the accumulation of data points; how many people visit a website; what pages they visit; what browser they use; how long they spend on each page; how do they navigate around a website; these are just some metrics that can be gathered and analysed under the guise of 'big data'. The term 'thick data' is an attempt to bring the emotional or human context to big data sets; ultimately to try and gain deeper insights: for example, why do users visit a particular page on a website? Why do more people visit that page at 9am on a Monday morning? Essentially, thick data suggests a qualitative approach; whether interviewing a selection of visitors, or using a focus group, or observational approaches using ethnography. Thick data is about trying to determine the context for the factual metrics and trends collected through big data.

The term 'thick data' has been around for quite some time. A cursory search of academic databases finds references dating back to the 1980's and no doubt there's many more. Qualitative analysis is a well-established research methodology and maybe this recent connection with big data is just a way of gently reminding us that whilst technology enables the collection of big data sets, we should not loose sight of the context when analysing. But presumably, companies wouldn't blindly make assumptions based purely on analytics? That would crazy! For example, if insights from big data were to shape the future evolution of a product or service, surely an organisation would attempt to validate those insights first, using a range of qualitative methods such as focus groups or interviews with lead users?

But the importance of context isn't just exclusive to big data; it's usually important for any type of data. For example, I'm a runner; I'm also a technologist; as such I have collected data from virtually every workout that I've ever done over the last three years: duration, heart rate, pace, evaluation are just some of the numerous data points that I have amassed. But what does all this data actually tell me? Of course I can get factual metrics that superficially tell me something and produce fancy graphs; but are they giving me deep and meaningful insights that will help me become a better runner? Without context, this data has limited value: how did I feel after a run? What did I eat before the run? After the run? Did this affect my performance or recovery? Was I stressed and under performing due to a lack of sleep? Or deadlines at work? ...there are so many different factors that can influence the analysis of this data. Now imagine scaling this up and collecting the fitness data from millions of runners: how would you establish a context and determine what's important with that big data set?

So whether you're collecting analytics from twitter, channel zap measurements from a set top box or networking metrics from your cable modem: what is all this data actually telling you? If you see a trend over time, is that because of an impact due to software changes or Is it due to some external influence? Is an analysis of the outliers more useful than the main trend? Can you actually draw firm conclusions without deeper analysis? Or in other words, without thick data; without establishing the context? Ultimately it's easy for organisations to get caught up on hype surrounding big data. Recent technological developments have made big data collection affordable and available to more organisations than ever before. But collecting big data is relatively easy; producing factual reports of metrics and trends is also relatively easy. The hard part however, is realising the limitations; recognising the need for context; and accepting that big data will ultimately lead to even more questions, which, when answered will lead to greater knowledge, insights and by extension, value.

One final illustration courtesy of Nassim Taleb, author of The Black Swan, (<a rel="nofollow" target="_blank" href="http://www.fooledbyrandomness.com/">http://www.fooledbyrandomness.com</a>). Consider a turkey; with each passing day it gets fed rather than slaughtered; each day, that turkey analyses the available data and calculates the probability of being slaughtered as decreasing. After 100 days, the probability of slaughter is now pretty low and the turkey is feeling safe and secure; a day later, in the run-up to Christmas, the turkey is slaughtered. So, collecting big data, and calculating trends is a great first step. But establishing the context, considering the influencing factors and following up with more questions and research will ultimately lead to greater insights. If we want popularise that by calling it 'thick data' then I guess that's ok - but it's nothing new. Still, beats the alternative of thinking like a turkey!

Mo Data
1:50 AM Jun 28 2014

Stashed in: Big Data!

To save this post, select a stash from drop-down menu or type in a new one:

That turkey story really is illustrative. Wow.

Adam Rifkin
1:20 PM Jun 28 2014

Big Data Needs Thick Data Latest fad, an insult or nothing that new?

Mo Data stashed this in Big Data

You May Also Like: