Dark Data: where companies scrape social media, but cannot map it to internal data that is poorly categorized and attributed
Mo Data stashed this in Big Data Preparation
Dark Data: B2B’s Big Data Challengehttp://www.information-management.com/news/dark-data-b2bs-big-data-challenge-10024990-1.html#!
It’s estimated that in 2013, the world will produce an astounding 4 zetabytes (or 4 million petabytes) of new data. But unlike B2C companies, which are pressed to make use of massive quantities of unstructured consumer data, B2B companies face a big data challenge that goes well beyond volume.
The issue is this: Efforts to scrape and utilize data from external sources like the Web and social networks hold little value if the data can’t be matched with a company’s own internal data – much of which is “dark.”
Dark data simply refers to all the corporate data that is collected from a variety of sources, but that isn’t routinely used day-to-day operations. Essentially, the data sits on a shelf unused and collecting dust.
Gartner analyst Svetlana Sicular puts it this way: “Similar to dark matter in physics, dark data cannot be seen directly, yet it is the bulk of the organizational universe.” And this vast array of dark data is Big Data’s biggest hurdle.
To help assess whether or not a company’s dark data has significant business potential, consider this set of questions:
- Does the data in question exist within the organization or is it external?
- How easy will it be to access the specific data source?
- How will this data help solve a business problem?
- How clean or reliable is the data?
- What special skills or technologies does analyzing this data require?
- How can this data be used going forward?
To begin to pierce the mystery surrounding a business’ dark data, integrating the company’s various data silos is a good first step. This makes it much easier for data “consumers” to analyze, research and discover new patterns that can facilitate decision-making. Enabling the data residing in various corporate silos to be shared is a great example of how the power of big data can be leveraged without introducing any new data at all.
The Barriers to Data Integration
One obstacle that needs to be reconciled is that the differing formats in which the data from different sources is stored doesn't allow it to be easily shared. For example, a company name is a common way to match information from different sources. But that becomes difficult when the company is referred to by more than one name (a including, Inc., incorporated, corp., corporation, abbreviations, etc.).
Equally important is transforming the data from different sources into a single, usable format. Think of it as “turning the lights on” for the data. This is a multistep process: First, the data has to be cleaned, matched and merged. Second, the data may have to be reorganized or “pivoted” to better reveal useful trends and patterns. Third, the target system has to be populated with the transformed data. All these steps require discovery, design and review to ensure the maximum insight and value can be extracted from the data.
A third consideration has to deal with managing the changes that occur to the data over time. These incremental changes need to be tracked, and each data source may have a slightly different way of identifying those changes.
All together, the value of the data from the wide spectrum of sources is much greater than the sum of the parts. Once integrated into a workable whole, it can be used to provide new insights and to drive decisions, leading to new efficiencies and innovations. By bringing its data into the light of day, a business can illuminate new routes to market.