Sign up FAST! Login

Learn to stop using shiny new things and love MySQL

Stashed in: Pinterest, Software!, Scaling, @rands, Awesome, For Milo, @triketora

To save this post, select a stash from drop-down menu or type in a new one:

My manager wrote this post.  I'm partially testing PandaWhale, partially sharing something I find interesting. 

Oh yeah, I saw that triketora and rands both tweeted a link to this post too. 

Thanks for sharing. I concur that Python and MySQL both scale great if you design your software well. 

My favorite part of the article:

Through the gauntlet, two of the most important lessons I learned building Pinterest were:

  • Don’t be the biggest. If you’re the biggest user of a technology, your challenges will be greatly amplified.
  • Keep it simple. No matter what technology you’re using, it will fail.

After about a year of fast, sleep-defying scaling at Pinterest, we had MySQL, Memcache, MongoDB, Redis, Cassandra, Membase and Elastic Search. Everything was on fire and breaking in their own special ways. We wanted to simplify and get rid of all the fancy stuff, but we also wanted something that would scale to the moon with us.


If you’re starting or growing a company, and your scale is smaller than huge, consider maturity to be your most important factor aside from basic requirements. Ask yourself — does MySQL sufficiently meet my needs? If so, use it. If you’re wondering if MySQL will be fast enough, the answer is YES. Even better than fast, MySQL’s performance will be consistent.

Great article with a lesson more people need to learn. Thanks for posting.

If it ain't broke don't fix it -- with something broken.

Beth I agree and generalize: Only solve the problems you have. 

Cool! I like it a lot!  I like the feed format and the more free-form post format.

Also I posted the same thing twice to see if it'd dedupe by link.  I wanted to delete my 2nd post but couldn't figure out how.  I tried clicking the "Saved" button to unsave, but I got a message that "You may not un-save this post until you put it in a stash." but I thought I had already stashed it in the default stash called "Saved" ?  

Anyway, what you guys have so far is looking pretty cool!

Thanks so much Eric!

Right now we're low enough volume that I just manually delete posts. 

I can delete that post if you want. 

Love it.  Go for it, thanks!

Happily done -- And feel free to ask me more questions about PandaWhale anytime. 

We're a work in progress!

I'd love to hear more about the point where redis starts scaling and you have to move to hbase. I've heard of a few hyper-growth companies do this, but no one provides details about why. Is it just running out if RAM or are there other issues at scale?

I haven't had a huge hand in operating Redis at Pinterest, but I can give a quick rundown on what I've picked up from our discussion threads:

- Issues with replication.  For pre-2.8 versions of Redis numerous issues can cause a slave to resync the entire data set from the master.  During this time the slave is in an inconsistent state and the master is under heavy load.  There also aren't great metrics for monitoring replication.

- Issues with backups.  Backups are not internally checksummed.  Our tooling around backups hasn't worked well.

- Data sets greater than RAM don't degrade well.

- Memory fragmentation is an issue.  Some instances have had issues where usage grows without limit until being killed by the OS.

- Haven't had great High Availability.  We have a manual failover process that is slow.  An option might be to use Redis cluster, but we're not sure it is production ready.

- Crash recovery / rebooting is slow.

- Our configurations are not sane as Redises have been spun up on an as-needed basis so we don't have consistent instance types / hardware which can make debugging difficult.

This is consistent with the Redis problems I've experienced on Imgur as a user. 

That's an awesome summary. Thank you!  the memory fragmentation issue is enlightening.  

You May Also Like: