Sign up FAST! Login

Poor-quality data stymies even the best attempt to analyze and visualize, and causes costly workarounds and coding challenges


checking quality

http://www.bigdatarepublic.com/author.asp?section_id=3183&doc_id=266199

Data quality refers to a broad range of factors that include data’s validity, accuracy, timeliness, reasonableness, completeness, etc. So what can you do about data quality?

1. Get small data right: There needs to be a model for data quality and integration in place that works against small data sets. Otherwise the risk of wasting money on big data projects is enormous.

2. Govern your data. Data needs to have an owner and change management processes -- no different from applications and processes.

3. Clearly define the problem. Because big data concepts are new to many organizations willing to put their toes in the water, a big data project too easily looks like a lab experiment. When the outcome isn’t clear up front, the quality level and the carefully tracking of data attributes don't become part of the project’s requirements and execution the way they should.

4. Test your data and system - The best way to test in the big data paradigm is to functionally test each piece of the system, whether that’s the mapreduce function (both inputs and outputs), aggregations, node configuration, or the validation of the differing data sources, both structured and unstructured.

Stashed in: Big Data

To save this post, select a stash from drop-down menu or type in a new one:

You May Also Like: