Meet the Data Brains Behind the Rise of Facebook | Wired
Rohit Khare stashed this in Startups
But Scuba is what’s called an in-memory data store. It keeps all that data in the high-speed memory systems running across hundreds of computer servers — not the hard disks, the memory systems — and this means you can query the data in near realtime.
“It gives us this very dynamic view into how our infrastructure is doing — how our servers are doing, how our network is doing, how the different software systems are interacting,” Parikh says. “So, when Genie tags me in a photo and it doesn’t show up within seconds, we can look to Scuba.”
Wait one cotton picking second...
Google’s Big Data platforms are still viewed as the web’s most advanced, but as Facebook strives to expand its own online empire, it isn’t far behind, and in contrast to Google, Facebook is intent on sharing much of its software with the rest of the world. Google often shares its big ideas, but Facebook also shares its code, hoping others will make good use of it. “Our mission as a company is to make the world more open and connected,” Parihk says, “and in building our infrastructure, we’re also contributing to that mission.”
Are they spinning that Google does not contribute software to the world???
This article sure does like to name drop.
Yes, Facebook’s engineering army includes people like Lars Rasmussen who create web applications like the company’s Graph Search tool — the stuff you can see on your Facebook page. It includes other software engineers who fashion the tools and widgets needed to build, test, and deploy those web applications. And nowadays, it includes hardware engineers like Amir Michael who design custom servers, storage devices, and, yes, entire data centers.
But it also spans a team of top engineers who deal in data — an increasingly important part of modern online operations. Scuba is just one of many “Big Data” software platforms Facebook has fashioned to harness the information generated by its online operation — platforms that push the boundaries of distributed computing, the art of training hundreds or even thousands of computers on a single task.
Built by engineers such as Raghu Murthy, Avery Ching, and Josh Metzler, these tools not only troubleshoot problems inside Facebook’s data centers, they help Facebook data scientists analyze the effectiveness of the company’s online applications and the behavior of its users, and in some cases, they’re even feeding data directly to Facebook users, driving familiar web applications such as Facebook Messages.
Facebook’s data team was founded by a man named Jeff Hammerbacher. Hammerbacher was a contemporary of Mark Zuckerberg at Harvard, where he studied mathematics, and before taking a job at Facebook in the spring 2006, he worked as a data scientist inside the big-name (but now defunct) New York financial house Bear Stearns.
Hammerbacher likes to say that the roots of Facebook’s data operation stretch back to an afternoon at Bear Stearns when the Reuters data feed suddenly went belly up. With the data feed down, no one could make trades — or make any money — and the feed stayed down for a good hour, because the one guy who ran the thing was out to lunch. For Hammerbacher, this snafu showed that data tools were just as important as data experts — if not more so.
“I realized that the delta between the data models that I generated and the models generated by a mathematician at another firm was going be pretty small compared to the amount of money we lost during that two hours without the Reuters data feed,” Hammerbacher remembers. “I felt like there was an opportunity to build a complete system that starts with data ingest and runs all the way to data model building — and try to optimize that system at every point.”
"The best minds of my generation are thinking about how to get people to click ads."
I wish the best minds of my generation would work on something more important.