How to use big data and Hadoop to drive telecom product development
Mo Data stashed this in Big Data Technologies
Big data collection and analysis isn’t new to the telecommunications industry. Communication service providers have been capturing, storing and mining vast quantities of call-detail records for decades.
And the data deluge continues. Telecom system switches are currently dumping hundreds of thousands of CDRs every few minutes, producing dozens of terabytes of data every 24-hour cycle. New applications based on data pulled from CDRs and sensors are rapidly being developed – from location awareness to dynamic bandwidth allocation.
Mobile data is predicted to grow by a factor of at least 100-times per year over the next decade. Studies suggest that there will be 10 billion mobile communications devices in use by 2018, which works out to more than one device for every person predicted to be present on the earth at that time (7.6 billion). There are already billions of sensors grabbing data, and both sensors and data quantities collected will obviously be multiplying at a rapid rate.
Among the most useful and interesting usages are based on machine-to-machine communications. Not only are these technologies engaging users, but they also provide a wealth of information that can be used to sharpen a company’s competitive edge. When you understand how customers choose to interact with the world around them, you can deliver irresistible service offerings.
Much of that data contains the best intelligence available to drive new product development. And while not every byte is equally valuable, the average CSP possesses a gold mine of data that’s just waiting to be monetized.
So where do you start? Don’t get bogged down (or elated) by thinking about the enormous amount of data that is being produced; focus instead on how data can provide value to your business. And be prepared to drop all of your preconceived notions – let the data lead you.
Turn that analysis upside downData scientists look at data differently than the rest of us. Instead of focusing on the trends – “most of our users for XYZ service are young women ages 14-19, exactly the target audience we expected” – they will look for the data that doesn’t fit: “We also have a cluster of 55- to 65-year-old women in the Southwest who are active users of our XYZ service. How and why did that happen?”
Investigating why anomalies exist is an excellent way to mine fresh new insights from data. The big trend information, conversely, is likely only to prove what we already know, assuming we know our customers as well as we should.
So when using data to drive product development, avoid the common process of defining parameters and then analyzing data to prove or disprove your theory. While the theory-proof process does work in situations where you simply need an answer to a defined problem, such as finding bandwidth bottlenecks or manufacturing defects, it’s unlikely to provide the best possible results when applied to less concrete issues such as product development.
Instead, examine the data and look for interesting information sets that might indicate strong preferences, very early emerging trends, or customer problems that could be solved with a new product or service. Using structured and unstructured data from across the enterprise may reveal unexpected opportunities. You’ll know you’re on to something when the data surprises you, or when it makes you pause and wonder “why?” and “what if?”
Putting big data to workCSPs typically have one primary problem when it comes to data: dispersal. Information typically flows through the telecom network into different divisions, systems, applications and management systems. The silos reflect longstanding business processes that are tightly tied to profit-creation activities, and those silos are therefore unlikely to come down anytime soon – if ever.
This creates a problem when you are utilizing data for product development. For a free-form activity such as this, you need to have the ability to analyze data across disparate business units, applications and systems. The greater the pool of data that you can work with, the broader and more far-reaching your conclusions are likely to be. Unless you’re developing for a hyper-targeted market, you want to be able to take a deep dive into a lake of data, not a puddle.
This is why Apache Hadoop is exceptionally well-suited for telecom use. Hadoop was engineered specifically to process a colossal amount of data accurately, effectively and affordably.
Hadoop enables easy use of both structured and unstructured data for advanced analytics. Streams of data can be processed with no need to reformat it, structure it or define a schema. Hadoop allows data to be captured and stored from every touch point across an organization, and also store data far more affordably per terabyte than other big data platforms. Instead of thousands to tens of thousands per terabyte, Hadoop delivers compute and storage for hundreds of dollars per terabyte.
Hadoop distributions vary in their effectiveness for specific enterprise tasks, or suitability for a particular industry’s use. In general, CSPs will want a Hadoop distribution that is particularly robust at handling data stored in disparate silos.
For product development and many other business-critical activities, CSPs will want exceptionally timely access to data, which is more real-time in nature. Historical data is very useful, but not best-suited for driving development of tomorrow’s products. To get the most out of your data, evaluate the real-time capabilities of each Hadoop distribution. Also check on how the solution manages data latency and whether it can deliver consistency across data volume, format or protocol.
Sameer Nori is senior product marketing manager at MapR Technologies. Nori has 10-plus years of experience in the technology industry in marketing, sales and consulting. With an executive MBA from the Fuqua School of Business, Duke University, Nori’s domain of expertise is in business intelligence and analytics.
Stashed in: Big Data