Sign up FAST! Login

Gartner drains ‘data lakes’ concept in new report

Gartner drains ‘data lakes’ concept in new report


A new report from Gartner, Inc. calls into question the concept of “Data Lakes,” or large repositories of unstructured data from a range of sources than can be used for analytics.

The study, called The Data Lake Fallacy: All Water and Little Substance,notes that while many vendors have signed on to the data lake concept, few companies agree on a definition of what data lakes are or the value they provide.

Data lakes are marketed as enterprisewide data management platforms for analyzing disparate sources of data in their native formats, wrote Gartner’s Nick Heudecker. ”This eliminates the up-front costs of data ingestion, like transformation. Once data is placed into the lake, it’s available for analysis by everyone in the organization.”

But co-author Andrew White pointed out that while data lakes might benefit certain parts of an organization, no one has yet realized the value proposition of enterprisewide data management.

The analysts write that data lakes help to solve two key problems; They eliminate data silos and they address the problem of how to analyze data stored in different formats. But data lakes aren’t without risks, including the lack of an underlying mechanism to maintain them, and the absence of metadata. These problems can eventually lead to what Gartner terms a “data swamp” where it becomes impossible to carry out any kind of accurate analysis.

The authors said are other risks with data lakes too, including access control and security considerations. Data may also be restricted by regulatory or privacy requirements, and just dumping it into a lake could lead to legal exposure.

Gartner instead advises enterprises to focus on “semantic consistency and performance in upstream applications and data stores”, rather than trying to consolidate all of their data in a lake.

In an interview with Application Development Trends, Jack Noriss of MapR Technologies, Inc., said data lake adoption was being driven by the cost, efficiency and agility of Hadoop.

“Gartner is rightly pointing out that not all Big Data and Hadoop solutions provide the performance, security and data protection capabilities that customers need,” Norris said.

Nevertheless, Gartner’s analysts don’t write off data lakes altogether. “The question your organization has to address is this: Do we encourage one-off, independent analysis of information in silos or a data lake, bringing said data together, or do we formalize to a degree that effort, and try to sustain the value-generating skills we develop?” White said.

Data lakes are likely to appeal if an organization that prefers the first scenario, but those that want to consolidate information should move beyond data lakes to focus on building a more robust data warehouse.

The data lake concept promises a centralized pool of  disparate data sources in one location, and treats alignment as a technical exercise. Information management leaders should understand the gaps in this concept — such as semantics, governance and security — and take the necessary precautions.

Stashed in:

To save this post, select a stash from drop-down menu or type in a new one:

You May Also Like: