Whatâ€™s After Big Data? Niche Analytics, Data Wrangling, Smart Storage - bother to read this one
Mo Data stashed this in Big Data Preparation
Whatâ€™s After Big Data? Niche Analytics, Data Wrangling, Smart Storage
Gregory T. Huang 8/21/14
Big data is a â€śhackneyed term,â€ť said Michael Stonebraker. â€śI try hard not to use it.â€ť
It was wintertime when I sat down with a few database experts in Boston to talk shop. Stonebraker, an MIT professor and entrepreneur, is one of those graybeards who was working in big data long before it was called big dataâ€”and will likely be doing so long after the term has faded.
In hindsight, his remark was a clear sign that the marketing hype around â€śbig dataâ€ť had peaked. Everyone was using the term, and no one seemed to know what it really meantâ€”or how it could benefit mainstream businesses and reward data-savvy entrepreneurs.
The premise of big data, at least, is easy to grasp: more and more information is being collected, stored, and analyzed, from click streams to sales records to mobile-device locations. What hasnâ€™t been easy is translating all that data into insights that help organizations make better decisions. That goes for retail, finance, healthcare, marketing, wireless, Internet commerceâ€”name the industry and youâ€™ll hear the lament that corporations arenâ€™t fully capitalizing on their digital assets.
The underlying reason is that â€śbig dataâ€ť as a technology area has been a mirage. Thereâ€™s no magic button, only myriad software techniques that may or may not work for problems specific to particular industries.
But a recent wave of startups has identified new classes of problems, showing where big-data capabilities are heading in the next few years. â€śItâ€™s really not about big data. Itâ€™s about the most useful data,â€ť says Andy Palmer, a co-founder (with Stonebraker) of Vertica Systems and Tamr, both data-related companies. Heâ€™s focused on giving companies the ability to access the information thatâ€™s most relevant, often hidden, and is â€śhigh-quality enough to answer compelling questions.â€ť
Andy PalmerTamr, where Palmer (pictured) is currently CEO, is working on â€śdata curationâ€ťâ€”software that helps organizations understand and connect their many different data sources and formats. The idea is to use a combination of statistics and human experts to show customers how their records are interrelated, identify redundancies and errors, and scrub the data so it can be used effectively. The Cambridge, MA-based startup has done pilot tests with Novartis, Thomson Reuters, and other enterprises.
There are broader terms for this sort of unsexy softwareâ€”data wrangling, plumbing, â€śmunging,â€ť or janitor workâ€”but the goal is a real one: to help businesses make better decisions faster, and save money. And a market for such services seems to be emerging: other startups vying for a piece of the pie include Trifacta, Paxata, and ClearStory in data preparation, and Attivio and Bedrock Data in data integration.
Bedrock Data, for example, has developed software that â€śsynchronizesâ€ť data across different business systems, such as customer relationship management, e-mail, marketing, and finance; the idea is to break down barriers between departments and make sure different teamsâ€™ records are consistent with each other. Meanwhile, the data-prep companies, including Tamr, are making tools meant to automate the traditional, labor-intensive â€śextract, transform, and loadâ€ť (ETL) process used to prepare data for data warehouses.
But once the data is cleaned up and shared, how do companies actually make sense of it all? Thatâ€™s a separate story, and it lies in the domain of analytics.
The field has seen a lot of consolidation and investment in recent months, with big players such as Intel, Hewlett-Packard, and Teradata buying into companies including Cloudera, Hortonworks, and Hadapt. A particularly hot sector has matured around Hadoop, an open-source analytics software platform. Many tech companies are writing software to make Hadoop industrial strength and integrate it with new and existing types of databases.
As Palmer sees it, analytics is increasingly moving into vertical industries and niche applications. RStudio, led by JJ Allaire and based in Boston, is one of the emerging leaders, though itâ€™s hard to understand what the company does if you donâ€™t use R, an open-source language for data scientists. Suffice to say, RStudio makes tools for large-scale statistical analysis, and the kinds of companies that use R include Bank of America, Facebook, Ford, Google, Uber, and Zillow.
With more targeted analytics tools, big businesses can collect data from new sources, such as sensors or social media, and start to squeeze useful insights from them. â€śEnterprise companies need to take a page from Internet companies,â€ť Palmer says. â€śThey need to get more analytical.â€ť
Some examples of niche approaches in analytics: Vast, based in Austin, TX, is tackling Web search and analysis in the automotive, real estate, and travel markets. In the Seattle area, Algorithmia, which just raised a $2.4 million venture round, runs an online marketplace for number-crunching algorithms, while Context Relevant makes predictive analytics software for the financial sector. And FarmLink, based in Kansas City, MO, has just raised a $40 million round to advance analytics for farmers.
Meanwhile, back in Boston, the startup Quant5 specializes in analysis tools for marketing purposes, and Recorded Future tries to predict world eventsâ€”things like civil unrest, terrorist attacks, and other security threatsâ€”by analyzing social media and Web documents for companies and government agencies.
Indeed, Doug Levin, the CEO of Quant5 and founder of Black Duck Software, says â€śenabling data-driven decisions in corporations is one of todayâ€™s most significant technology trends.â€ť He points out that what companies can do with data has moved far beyond the notions of big-data analytics from the past few years; analytics certainly isnâ€™t new, but the kinds of analysis that can be done and the types of data that can be accessed are changing.
Which leads us to one more big trend in data, and perhaps an unexpected one: storage is hot again. Not the commodity storage systemsâ€”disks, flash drives, appliancesâ€”though those are still a huge business. Rather, a number of well-funded startups are pursuing new kinds of storage software that give corporate users more intelligence about their data.
Take Actifio, a Boston-area company that has raised more than $200 million to try to win the â€ścopy dataâ€ť storage marketâ€”systems that companies use to manage multiple versions of their data that exist for different purposes. The firm started out with the idea of separating data backup and protection from the storage layer. But once customers use Actifioâ€™s software for backup, they find they can use the same software to unify their stored data so thereâ€™s effectively one golden copy of everything. Thatâ€™s the idea, anyway.
Whatâ€™s interesting is that Actifio is trying to save companies money on traditional storage and software, which takes away business from giants like EMC and IBM. But Actifio is solely about data management; it doesnâ€™t really touch analytics or business intelligence.
For that, you have to consider DataGravity, which represents another interesting evolution of data storage. The Nashua, NH-based startup has raised some $42 million from venture investors to create a new storage architecture that could give businesses new visibility and insights into their data.
DataGravity is trying to â€śextract information from storage,â€ť says CEO and co-founder Paula Long. The companyâ€™s product, just announced this week, looks like a regular storage system to the user. But the software that goes with it can â€śseeâ€ť into an organizationâ€™s files and track all interactions with the dataâ€”who accessed or contributed to a particular file and when, what they did with it, whom they worked with, and so on. The software provides charts and visualizations to help users drill down into the data and keep tabs on it.
Paula Long (image: DataGravity)â€śBefore, you could just see the file name. Now you can do an MRI on it,â€ť says Long (pictured), who previously co-founded EqualLogic, which was acquired by Dell in 2007.
The goalâ€”a familiar one by nowâ€”is to give IT and business users a deeper understanding of corporate data that can help them make better decisions. DataGravityâ€™s beta customers include companies in the tech, legal, retail, and healthcare industries. â€śWe believe storage should strategically participate in your business, not just support it. Itâ€™s not just a container,â€ť Long says. She adds that storage is going through a â€śtransformative momentâ€ť as it enters the information and analytics age.
And she seems to agree that big-data technologies have moved beyond the realm of geeks and into helping mainstream users solve real business problems. For DataGravity, that means getting a better grip on all the information that resides in a companyâ€™s network and filesâ€”without trotting out an old buzzword.
â€śYou didnâ€™t have to understand big data, you didnâ€™t have to program anything,â€ť Long says about her firmâ€™s customers. â€śWeâ€™re not going after the big-data space.â€ť
Stashed in: Big Data!