Scalable Twitter Architecture to Deal with 150M Active Users, 300K QPS, A 22 MB/S Firehose, and Send Tweets in under 5 seconds
Adam Rifkin stashed this in Scaling
Found this in a Michael Abbott tweet:
Ryan King says it's out of date:
Nonetheless I will include my favorite parts below.
Toy solutions solving Twitter’s “problems” are a favorite scalability trope. Everybody has this idea that Twitter is easy. With a little architectural hand waving we have a scalable Twitter, just that simple. Well, it’s not that simple as Raffi Krikorian, VP of Engineering at Twitter, describes in his superb and very detailed presentation on Timelines at Scale. If you want to know how Twitter works - then start here.
It happened gradually so you may have missed it, but Twitter has grown up. It started as a struggling three-tierish Ruby on Rails website to become a beautifully service driven core that we actually go to now to see if other services are down. Quite a change.
Twitter now has 150M world wide active users, handles 300K QPS to generate timelines, and a firehose that churns out 22 MB/sec. 400 million tweets a day flow through the system and it can take up to 5 minutes for a tweet to flow from Lady Gaga’s fingers to her 31 million followers.
A couple of points stood out:
- Twitter no longer wants to be a web app. Twitter wants to be a set of APIs that power mobile clients worldwide, acting as one of the largest real-time event buses on the planet.
- Twitter is primarily a consumption mechanism, not a production mechanism. 300K QPS are spent reading timelines and only 6000 requests per second are spent on writes.
- Outliers, those with huge follower lists, are becoming a common case. Sending a tweet from a user with a lot of followers, that is with a large fanout, can be slow. Twitter tries to do it under 5 seconds, but it doesn’t always work, especially when celebrities tweet and tweet each other, which is happening more and more. One of the consequences is replies can arrive before the original tweet is received. Twitter is changing from doing all the work on writes to doing more work on reads for high value users.
- Your home timeline sits in a Redis cluster and has a maximum of 800 entries.
- Twitter knows a lot about you from who you follow and what links you click on. Much can be implied by the implicit social contract when bidirectional follows don’t exist.
- Users care about tweets, but the text of the tweet is almost irrelevant to most of Twitter's infrastructure.
- It takes a very sophisticated monitoring and debugging system to trace down performance problems in a complicated stack. And the ghost of legacy decisions past always haunt the system.
I love the line "the ghost of legacy decisions past always haunts Twitter's system".
Twitter no longer wants to be a web app.
Did Twitter EVER want to be a web app?
- How to make this pipeline faster and more efficient? Fanout can be slow. Try to do it under 5 seconds but doesn’t work sometimes. Very hard, especially when celebrities tweet, which is happening more and more.
- Twitter follow graph is an asymmetric follow. Tweets are only rendered onto people that are following at a given time. Twitter knows a lot about you because you may follow Lance Armstrong but he doesn’t follow you back. Much can be implied by the implicit social contract when bidirectional follows don’t exist.
- Problem is for large cardinality graphs. @ladygaga has 31 million followers. @katyperry has 28 million followers. @justinbieber has 28 million followers. @barackobama has 23 million followers.
This tells us how out of date this presentation is, since Justin Bieber now has 42 million Twitter followers.
Mother of God.