While there's a role for data scientists, the real power is in making advanced analysis work for excel users
Mo Data stashed this in Big Data Philosophy
While there's a role for Ph.D.-level experts, the real power is in making advanced analysis work for mainstream -- often Excel-wielding -- business users.
Data, data everywhere, and nary a data scientist in sight. Or at least, not one you can afford. It's a classic Catch-22. To thrive, businesses need to pull financial, sales, predictive, social, and other data into a complete view of the customer. But big data practitioners with fancy degrees who can bring sophisticated analytics chops to bear on that effort start in the six figures, if you can even find one.
Academics and consultants pontificate on the crisis. McKinsey & Co. exclaims that advanced big data analytics, driven partly by the Internet of Things, could increase GDP in retailing and manufacturing by up to $325 billion annually and trim nearly as much from the cost of healthcare and government services by 2020. Too bad most organizations will never be able to hire that expertise. Yep, the world's got big data envy bad, and a data scientist is the silver bullet we all need.
Here's an alternative viewpoint: You don't need them. Instead, bring big data analytics down to earth, train some people, and use the tools you have, with a few select additions. Now before you go all Pi and post N∞ comments opposing the concept, hear me out.
Yes there's an explosion of data. Total digital data is doubling in size every two years, according to the annual IDC Digital Universe Survey. IBM says 90% of the data in the world today has been created in the last two years, and we're certainly not slowing down. But guess what? There's also an explosion of tools and tactics to help knowledge workers tap into and make sense of it all. Incumbent enterprise data management vendors such as IBM, Informatica, SAP, and Oracle have beefed up their suites through development and acquisition. The cloud provides ample room to store all the data you can afford to collect, and Amazon and Google have added query tools and engines to use it. Add in the usual startups and specialists, like Domo, Pentaho, Tableau, and Tibco/Jaspersoft, and you have an amazing ecosystem to support better analytics. Heck, even Excel now supports massive data sets and 1 million-plus rows.
On the back end, a crack team of great data scientists and engineers may be happy to completely redesign your architecture. But is that your top priority today? Or would you rather take advantage of existing tools that don't need a revamp of your entire data model and spend that cash elsewhere?
Look, a $750 million-revenue organization already distributes thousands of reports and dashboards every week. People up and down the ranks make data-driven decisions all the time. That's the good news. The bad news is that most of those decisions are based on legacy practices and methodologies that don't fully leverage existing structured data sets, let alone unstructured data. So back away from the Hadoop cluster and focus on how you can improve your company's analytics skill sets and results today. It's completely doable.
Moreover, you can make your internal data work for your not only your employees, but for customers and suppliers. I work with a US software publisher that's a great example of leveraging data that would once have sat idle. The company offers reading and literacy software for grades K-12. Its system provides detailed analysis on student development over the course of the year. The current reporting is typical in education -- progress by student, grade, and school, aligned to state standards.
The publisher realized it could create an anonymous pool of performance data that would link all student progress across its entire client base. This data could be combined with socioeconomic and organizational data to create a rich "best practices" data pool that simply doesn't exist today. The data set, scheduled to go live this fall, will let a school analyze, in near real time, how students are performing relative to their class, district, and similar institutions across the United States. They can analyze specific strategies and exercises that are part of the program at an individual district level. And the best part? Reporting is built right into existing data tools, so people can tap into this treasure trove on a continual basis.