A Beginner's Guide to Why Big Data is so Hot
Adam Rifkin stashed this in Big Data!
Stashed in: DevOps, Google!, Scaling, Terms
BusinessInsider has this great beginner's guide to Big Data:
Big data refers to a combination of technologies that can search and analyze massive amounts of information nearly instantly no matter what format they are in: tweets, posts, e-mails, documents, audio, video, whatever.
Big Data is changing things for three reasons:
Big Data is not really a new technology, but a term used for a handful of technologies. While some of these technologies have been around for a decade or more, a lot of pieces are coming together to make big data the hot thing for 2012.
It can handle massive amounts of information in all sorts of formats.
It works fast -- practically instantly.
It is affordable because it uses ordinary, low-cost hardware.
Four technologies make up Big Data:
Big Data solves problems for companies like eBay, Facebook, LinkedIn, Netflix, Twitter and Zynga. But it is also allowing completely new types of companies to be built.
Analytics -- analyzing big amounts to come up with answers
In-memory databases -- processing information super-fast
NoSQL databases -- cloud-based computing
- Hadoop -- combining NoSQL with real-time analytics
The market for big data is $70 billion and growing by 15% a year.
Good thing Google just released its entry into the market, BigQuery:
BigQuery is Google's cloud service alternative to things like the open source project Hadoop, HP's Vertica, IBM's Netezza or Wall Street IPO darling Splunk.
BigQuery can handle terabytes of data and although Google is now charging for the service it is letting users store up to 100 gigabytes for free.
This is particularly interesting because Google invented the techniques that lead to the big data revolution. Years ago it published some technical papers describing how it deals with massive volumes of data so quickly. Others read those papers, used those technique and came up with their own versions. The big data revolution was born.
Google thinks BigQuery beats the alternatives because it's so easy to use -- by hooking into one interface BigQuery gives users access to Google's powerful data centers. Alternatives like Hadoop take a lot of expertise to set up and you still have to run it on some hardware somewhere.
BigQuery is priced affordably too -- at least for a six month "introductory" period. 12 cents per gigabyte per month and 3.5 cents per gigabyte processed per day.
Well, it should be fun watching the rest of this decade play out...
Is it me or does the Hadoop elephant look a little drunk?
Perhaps he's drunk with power after throwing a Hadoop server up to crunch some Big Data...
who owns the data that gets uploaded to Google Storage and used by BigQuery? does Google use that data once it's uploaded?
According to the Terms of Service, "This Agreement does not grant either party any rights, implied or otherwise, to the other's content or any of the other's intellectual property. As between the parties, Customer owns all Intellectual Property Rights in Customer Data, and Google owns all Intellectual Property Rights in the Service."
Thanks for the link, Adam...cyber-blushing since I didn't have time to look it up myself :O) Assuming that the "Intellectual Property Rights in the Service" means the code/capabilities Google authored to create the service - and not what they do with your data as a result of the service" then it sounds like a great deal. Do you use it?
I don't use it yet, but I'd like to try it someday.
12:04 AM May 02 2012