The significant shifts that are going on behind the marketing term of 'Big Data'
The increasingly large quantity of data. Companies have access to, and the ability to collect, far more data than ever before. Sometimes that means tracking every potential customer’s every click across your own website as well as, in some cases, other sites. Sometimes that means understanding how current customers are actually using your product, day in and day out. It could mean collecting data on how people move through a city in order to facilitate better urban planning. Sensors can pick up information and send data into databases at a dizzying rate. All that putting strains on traditional database technologies.
The lack of structure in much of this data. It used to be easy to tell what data was: It was the stuff that you could pigeonhole into specific database fields, like name, address line 1, address line 2, and so on, and then query with SQL statements. Now we have lots of data like this, but we also have enormous amounts of unstructured data: video and audio files, huge amounts of social networking texts, emails, the transcripts of customer support calls, and more. How do you manage data if you don’t even know how to categorize it, or what buckets to put it in? Emerging machine learning technologies, like IBM’s Watson, are one approach for handling such a mess of data as it comes in on the fly.
A shift in the underlying storage technologies. Many companies are starting to move away from data warehouses, storage area networks, and other network storage technologies and toward more distributed, clustered, scalable storage. Hadoop is the poster child of this shift, but it is not the only one. Besides, as it turns out, Hadoop itself has some significant limitations. It can be extremely slow to run jobs in Hadoop, for instance. And it needs better security capabilities.
The ability to get useful information out of this data easily. With the right tools, ordinary, non-data-scientist types have the ability to get meaningful answers out of huge quantities of data. Increasingly, they also have the desire to do this. Most people don’t want to have to learn SQL. They want to look at pretty charts that show them how their business is doing, right now. They want the ability to look at different facets of their data or drill down into details so they can figure out how to make the business run better. This has always been the promise of business intelligence (BI) software, though BI projects have a reputation of getting bogged down in long, drawn-out, incredibly expensive projects that produce less than promised. Maybe today’s visualization and data integration tools will achieve what last decade’s BI tools could not.
Stashed in: Big Data