Under the hood: Data diving with Scuba #facebook #scuba #bigdata

Mo Data stashed this in Big Data Technologies

https://www.facebook.com/notes/facebook-engineering/under-the-hood-data-diving-with-scuba/10150599692628920

At Facebook, we often find ourselves seeking very specific answers to complicated questions. Whether we’re diagnosing a performance regression or measuring the impact of an infrastructure change, we want data, and we want it fast. Scuba is a system we developed for doing real-time, ad-hoc analysis of arbitrary datasets. It marks a significant evolution in the way we collect and analyze data from the variety of systems that keep the site running each day.

In the beginning…

The Facebook infrastructure team relies on real-time instrumentation to ensure the site is always running smoothly. This usually means monitoring graphs or using tools to drill down and understand what’s going on as hundreds of millions of people hit the site each day. Traditionally, we’ve relied on pre-aggregated graphs or tools that query from samples stored in a MySQL database. While these approaches were often sufficient for basic queries and small datasets, as data grew in size, and as we needed to ask more sophisticated questions, they became way too rigid and slow.

A motivating example: The inflexibility of old approaches

Flexibility and speed in querying data is critical for diagnosing site issues quickly. Suppose we measure the time it takes to render the Facebook home page and have the foresight to periodically aggregate that data from raw samples into a graph. Suddenly the graph spikes, showing a large slowdown, but offering little insight as to why the slowdown occurred. Is the spike due to a datacenter-specific issue? Is it a new feature someone turned on to test? Or a network issue in a single country? Unfortunately, we may have lacked the foresight to aggregate the data by all these dimensions. These types of drill-down and data dissection questions are extremely common when working on complicated systems.

A straightforward solution might be to proactively aggregate data across every conceivable dimension. However, as we collect more dimensions such as page, server, source version, datacenter, and country, to name just a few common ones, the number of possible aggregations grows exponentially. Proactively aggregating or adequately indexing quickly becomes infeasible, and we’re thus forced to resort to plodding disk-scans over samples in a database. Scuba addresses this tradeoff between flexibility and speed.

Scuba: A distributed in-memory stats store

One of our core competencies at Facebook is developing distributed services to compute results to complex queries fast. We achieve this speed by building Thrift services, like News Feed and search, which store data primarily in memory. Queries are sent to a number of machines that work in parallel, and results are aggregated into a single response that’s ultimately rendered by the browser. The same approach we use to serve News Feed stories in real-time can be used to serve statistics data about our internal systems.

The concept is straightforward: instrumentation data stems from samples. Samples are a set of strings, integers, and tag-set columns. Almost all queries are of the form: “give me the average, median, percentile, rate, or sum of some integer column”, “add some set of filters,” “group by some set of dimensions,”or “compare one thing to some other thing or time.”

These queries can be easily distributed. Samples are written from a variety of sources intoScribe logs using JSON encoding. Scuba tailers read the logs and partition samples across hundreds of machines. Each machine manages only a subset of samples stored in un-aggregated form in RAM, and each subset can be scanned and aggregated on-the-fly and in parallel to achieve maximum flexibility and lightning-fast query times. Results from calculations such as average and sum are straight-forward to combine across machines. For percentile queries, we use smartly-bucketed histograms.

The system is general and fast, so users can populate any dataset they want to dissect, and operators can diagnose issues with minimal latency. The result is that we now have hundreds of datasets containing up to hundreds of gigabytes of data and billions of samples that take less than a second to query.

Scuba: The front-end

While query languages are nice, nothing beats the candor of an old-fashioned graph. We found a surprisingly large number of our legacy tools solving the same problem: either, 1) displaying data in a formatted table, 2) displaying data in a graph over time, or 3) displaying data with some clever visualization. For the most part, each individual data-owner wrote their own user-interface on top of their own table, with their own schema, their own way of populating data, and their own filtering options.

A recently fixed memcache key regression on a page. The dotted line overlays the previous week.

The Scuba web front-end supports the majority of these use-cases. It generalizes the display layer by providing views of data, which in Scuba parlance we call goggles. The three most popular goggles are table, time series, and distribution. Scuba also provides a set of controls across views that allow users to filter by, group by, or compare dimensions and metrics. Scuba persists the state of these controls as users switch between goggles like table and time series, or even between datasets.

A server time regression seen on a particular page with distribution goggles on, especially around the median request.

For use-cases that Scuba doesn’t support by default, data owners can subclass a view to customize it. Datasets with a country dimension automatically get map goggles, which is a world heat map of how good or bad a selected metric is in each country. If none of the goggles visualize data the way you want, create your own! Remember, goggles are just a view on any aggregated data. There are infinite ways to visualize that data.

“Sexy” goggles Circle Pack view from a recent experiment comparing time to download a particular file directly from a particular datacenter, broken down and filtered to a handful of source countries.

The awesomesauce

With the scheme described above, we gain a bunch of advantages:

Queries take less than a second to fulfill, even traversing hundreds of millions of samples and hundreds of gigabytes of data.
Results are live to a few seconds ago.
Collecting data is now as easy as calling an addSample() method in code. Data-owners no longer need to maintain a database, design a schema, or create yet another user interface.
You don’t have to be an engineer or learn a query language to analyze data.
Powerful new querying options that previous tools typically didn’t support. For example:
- Filter to outlier page loads that performed more than 100 database queries. Or, show me only page loads from Indonesia that took more than 10 seconds to load.
- Find the top 10 “sick” machines based on properties like RPS or thread queue length.
- Group by source and destination datacenters to bubble up high-latency saturated links.
- Compare various datasets between two deployed versions of the code (i.e. A/B test across various datasets).
- Show me a heat map of servers by their CPU consumption. Or show me a heat map of racks by their packet throughput.

Development and beyond

Scuba began as a casual discussion between a few engineers about how to make one of our legacy tools faster by partitioning a MySQL database. The first version of the back-end was coded as part of a hackathon by David Reiss. I coded the front-end during a couple weeks of frustration with current tools. The backend functionality was greatly expanded on and scaled up by John Allen. And the front-end was made far sleeker and more functional by Okay Zed. A then summer intern, Oleksandr Barykin, improved the worst query times by an order of magnitude. Adoption was mostly organic as hundreds of core and new datasets got migrated or added by various teams. Dozens of engineers contributed a slew of features and fixes, and now over 500 employees use the tool monthly.

The best part of this project is that pretty much all of these efforts were voluntary contributions of time by engineers taking their own initiative to help build something awesome. One of my favorite parts about being an engineer at Facebook is the freedom to pursue ideas and see them succeed on their merit. Scuba far surpassed the expectations we had when we discussed the idea at a lunch table. People have added some amazing features and datasets that we’d have never anticipated, and used them to do some incredible things. I look forward to seeing Scuba expand with even more data, more goggles, and more features, and seeing the complex systems that power Facebook become just a little easier to understand with each passing day.

Lior Abraham, an engineer on the Site Performance team, likes to understand what the heck is going on.

<a rel="nofollow" target="_blank" href="https://www.facebook.com/notes/facebook-engineering/under-the-hood-data-diving-with-scuba/10150599692628920">https://www.facebook.com/notes/facebook-engineering/under-the-hood-data-diving-with-scuba/10150599692628920</a>

At Facebook, we often find ourselves seeking very specific answers to complicated questions. Whether we’re diagnosing a performance regression or measuring the impact of an infrastructure change, we want data, and we want it fast. Scuba is a system we developed for doing real-time, ad-hoc analysis of arbitrary datasets. It marks a significant evolution in the way we collect and analyze data from the variety of systems that keep the site running each day.

In the beginning…

The Facebook infrastructure team relies on real-time instrumentation to ensure the site is always running smoothly. This usually means monitoring graphs or using tools to drill down and understand what’s going on as hundreds of millions of people hit the site each day. Traditionally, we’ve relied on pre-aggregated graphs or tools that query from samples stored in a MySQL database. While these approaches were often sufficient for basic queries and small datasets, as data grew in size, and as we needed to ask more sophisticated questions, they became way too rigid and slow.

A motivating example: The inflexibility of old approaches

Flexibility and speed in querying data is critical for diagnosing site issues quickly. Suppose we measure the time it takes to render the Facebook home page and have the foresight to periodically aggregate that data from raw samples into a graph. Suddenly the graph spikes, showing a large slowdown, but offering little insight as to why the slowdown occurred. Is the spike due to a datacenter-specific issue? Is it a new feature someone turned on to test? Or a network issue in a single country? Unfortunately, we may have lacked the foresight to aggregate the data by all these dimensions. These types of drill-down and data dissection questions are extremely common when working on complicated systems.

A straightforward solution might be to proactively aggregate data across every conceivable dimension. However, as we collect more dimensions such as page, server, source version, datacenter, and country, to name just a few common ones, the number of possible aggregations grows exponentially. Proactively aggregating or adequately indexing quickly becomes infeasible, and we’re thus forced to resort to plodding disk-scans over samples in a database. Scuba addresses this tradeoff between flexibility and speed.

Scuba: A distributed in-memory stats store

One of our core competencies at Facebook is developing distributed services to compute results to complex queries fast. We achieve this speed by building <a rel="nofollow" target="_blank" href="https://www.facebook.com/notes/facebook-engineering/under-the-hood-data-diving-with-scuba/10150599692628920#">Thrift</a> services, like News Feed and search, which store data primarily in memory. Queries are sent to a number of machines that work in parallel, and results are aggregated into a single response that’s ultimately rendered by the browser.  The same approach we use to serve News Feed stories in real-time can be used to serve statistics data about our internal systems.

The concept is straightforward: instrumentation data stems from samples. Samples are a set of strings, integers, and tag-set columns. Almost all queries are of the form: “give me the average, median, percentile, rate, or sum of some integer column”, “add some set of filters,” “group by some set of dimensions,”or “compare one thing to some other thing or time.”

These queries can be easily distributed. Samples are written from a variety of sources into<a rel="nofollow" target="_blank" href="https://www.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Ffacebook%2Fscribe&h=8AQGcPa55&s=1">Scribe</a> logs using <a rel="nofollow" target="_blank" href="http://l.facebook.com/l.php?u=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FJSON&h=mAQG6zg1W&s=1">JSON</a> encoding. Scuba tailers read the logs and partition samples across hundreds of machines. Each machine manages only a subset of samples stored in un-aggregated form in RAM, and each subset can be scanned and aggregated on-the-fly and in parallel to achieve maximum flexibility and lightning-fast query times. Results from calculations such as average and sum are straight-forward to combine across machines. For percentile queries, we use smartly-bucketed histograms.

The system is general and fast, so users can populate any dataset they want to dissect, and operators can diagnose issues with minimal latency. The result is that we now have hundreds of datasets containing up to hundreds of gigabytes of data and billions of samples that take less than a second to query.

Scuba: The front-end

While query languages are nice, nothing beats the candor of an old-fashioned graph. We found a surprisingly large number of our legacy tools solving the same problem: either, 1) displaying data in a formatted table, 2) displaying data in a graph over time, or 3) displaying data with some clever visualization. For the most part, each individual data-owner wrote their own user-interface on top of their own table, with their own schema, their own way of populating data, and their own filtering options.

A recently fixed memcache key regression on a page. The dotted line overlays the previous week.

The Scuba web front-end supports the majority of these use-cases. It generalizes the display layer by providing views of data, which in Scuba parlance we call goggles. The three most popular goggles are table, time series, and distribution. Scuba also provides a set of controls across views that allow users to filter by, group by, or compare dimensions and metrics. Scuba persists the state of these controls as users switch between goggles like table and time series, or even between datasets.

A server time regression seen on a particular page with distribution goggles on, especially around the median request.

For use-cases that Scuba doesn’t support by default, data owners can subclass a view to customize it. Datasets with a country dimension automatically get map goggles, which is a world heat map of how good or bad a selected metric is in each country. If none of the goggles visualize data the way you want, create your own! Remember, goggles are just a view on any aggregated data. There are infinite ways to visualize that data.

“Sexy” goggles Circle Pack view from a recent experiment comparing time to download a particular file directly from a particular datacenter, broken down and filtered to a handful of source countries.

The awesomesauce

With the scheme described above, we gain a bunch of advantages:

<ul><li>Queries take less than a second to fulfill, even traversing hundreds of millions of samples and hundreds of gigabytes of data.</li><li>Results are live to a few seconds ago.</li><li>Collecting data is now as easy as calling an addSample() method in code. Data-owners no longer need to maintain a database, design a schema, or create yet another user interface.</li><li>You don’t have to be an engineer or learn a query language to analyze data.</li><li>Powerful new querying options that previous tools typically didn’t support. For example:<ul><li>Filter to outlier page loads that performed more than 100 database queries. Or, show me only page loads from Indonesia that took more than 10 seconds to load.</li><li>Find the top 10 “sick” machines based on properties like RPS or thread queue length.</li><li>Group by source and destination datacenters to bubble up high-latency saturated links.</li><li>Compare various datasets between two deployed versions of the code (i.e. A/B test across various datasets).</li><li>Show me a heat map of servers by their CPU consumption. Or show me a heat map of racks by their packet throughput.</li></ul></li></ul>

Development and beyond

Scuba began as a casual discussion between a few engineers about how to make one of our legacy tools faster by partitioning a MySQL database. The first version of the back-end was coded as part of a hackathon by David Reiss. I coded the front-end during a couple weeks of frustration with current tools. The backend functionality was greatly expanded on and scaled up by <a rel="nofollow" target="_blank" href="https://www.facebook.com/jallen">John Allen</a>. And the front-end was made far sleeker and more functional by Okay Zed. A then summer intern, Oleksandr Barykin, improved the worst query times by an order of magnitude. Adoption was mostly organic as hundreds of core and new datasets got migrated or added by various teams. Dozens of engineers contributed a slew of features and fixes, and now over 500 employees use the tool monthly.

The best part of this project is that pretty much all of these efforts were voluntary contributions of time by engineers taking their own initiative to help build something awesome. One of my favorite parts about being an engineer at Facebook is the freedom to pursue ideas and see them succeed on their merit. Scuba far surpassed the expectations we had when we discussed the idea at a lunch table. People have added some amazing features and datasets that we’d have never anticipated, and used them to do some incredible things. I look forward to seeing Scuba expand with even more data, more goggles, and more features, and seeing the complex systems that power Facebook become just a little easier to understand with each passing day.

Lior Abraham, an engineer on the Site Performance team, likes to understand what the heck is going on.

Mo Data
11:24 PM May 04 2015

Stashed in:

To save this post, select a stash from drop-down menu or type in a new one:

Under the hood: Data diving with Scuba #facebook #scuba #bigdata

Mo Data stashed this in Big Data Technologies

You May Also Like: