Apache Hadoop Goes Realtime at Facebook (June 2011)
Rohit Khare stashed this in Hacking
Stashed in: Facebook!
The second generation of non-MapReduce Hadoop applications needed to dynamically index a rapidly growing data set for fast random lookups. One primary example of such an application is Facebook Messages . Facebook Messages gives every Facebook user a facebook.com email address, integrates the display of all e-mail, SMS and chat messages between a pair or group of users, has strong controls over who users receive messages from, and is the foundation of a Social Inbox. In addition, this new application had to be suited for production use by more than 500 million people immediately after launch and needed to scale to many petabytes of data with stringent uptime requirements. We decided to use HBase for this project. HBase in turn leverages HDFS for scalable and fault tolerant storage and ZooKeeper for distributed consensus.
Amazing technical feat.
Too bad the product is so painfully slow for me to use.