This is a technical overview of the new backend system we launched a couple of days ago.
- Bret Taylor
First question, why MySQL instead of a straight key/value/attribute store, like a bdb? (Amazon's SimpleDB works along similar principles, btw.)
- DeWitt Clinton
Cool stuff Bret. Love to see this kind of stuff. Would love that chat about the Ajax side of things ;)
- Dion Almaer
DeWitt: Historical reasons; we have some operational experience maintaining MySQL servers, and MySQL is very popular so there's a lot of support available.
- Tudor Bosman
Oh right, none of you are ex-Amazon. (Any Amazonian reading this will understand...)
- DeWitt Clinton
How small do you keep the shards? Can you keep the entire index tables in RAM? Are you backed with SSD?
- DeWitt Clinton
Not all of it is RAM. We use normal hard drives for this system, and we have enough RAM such that the working set fits in memory. Most of the oldest stuff on FriendFeed is not accessed frequently, and that data is not typically resident in memory.
- Bret Taylor
So for something like 'user_id_index.get_all' you'd trust that InnoDB has that user in RAM, and that you have only a small set of active users at any given time (for web requests). Same for the relevant entities you'd "join" against. Makes sense. (Sorry, having a hard time typing coherently tonight.)
- DeWitt Clinton
Yah, those indexes are typically in RAM. Our indexes contain a timestamp as well (since they are ordered reverse chronologically), so the oldest entries referenced by that index would likely not be in RAM depending on how many pages back they are on most feeds.
- Bret Taylor
BTW, did you guys look at any other serialization mechanism other than pickle? Any pros/cons for cpickle?
- Arvind Sundararajan
cPickle is the same format as pickle, just implemented in C instead of Python. We use cPickle. marshal is faster, but according to docs may change from version to version of Python, so we didn't want to use it in our DB.
- Bret Taylor
@Bret "250 million entries", is it possible to give the size of the DB? and what is your opinion about when will you need to change this system again or is it strong enough for years Meybe you will only need hardware instead of structure changes?
- Ömer Faruk Kurt
Ömer: I don't have an exact number off the top of my head, but a lot of data is the indexes themselves, so adding indexes is almost as significant as adding entries.
- Bret Taylor
This sounds exceptionally similar to how the AppEngine datastore does things, except that it's MySQL and not BigTable.
- Alex Power
Aren't commercial RDBMSes (like Oracle, MSSQL) designed to support the sorts of things you're looking for, like online reindexing and joining across shards? Why not use them? Lack of experience? NIH? Too expensive?
- Gabe
"We like MySQL for storage, just not RDBMS usage patterns" Excellent post Bret, thanks for sharing!
- Mahesh CR
Gabe: "designed to support" and actually working are two different things. I think that is why companies like ours generally don't use commercial software and choose to use open source software. If it doesn't work, we can fix it. In my experience, most of those features don't work at the scale consumer web sites need, and it certainly wasn't worth the cost and time to find out from our perspective.
- Bret Taylor
Non-RDBMS-patterns may work when there is no great needs for processing the actual data (like reports etc). In LOB-apps this would be a big no-no, but in large-scale services like FF alternative methods are almost necessary...
- Jemm
Thanks for the writeup! Is there a reason you don't declare the added_id as an UNSIGNED INT?
- Roger
@Bret, this is a pretty cool write u!. Might I suggest as a subject for future write ups, how the realtime update system works (e.g. how it knows what updates it needs to push), and perhaps how the feed fetching system works and how these might inter-relate?
- Ray Cromwell
Ray: good ideas. We will do more of these. We want to in theory - they just take a bit of time to write up, and we like writing code more than blog posts :)
- Bret Taylor
Bret, do you really think that guys like Oracle and MS create features for enterprise-level databases that don't work or scale to meet the needs of those who buy them? Of course you may be right, but the TPC tests generally get results in the hundreds and thousands of transactions per second. In reality, though, what can you really fix if something doesn't work? Are you going to debug MySQL if it starts corrupting data or optimize it if it's too slow?
- Gabe
Gabe: yes, we have already debugged MySQL when it has crashed. It would have been impossible without the debug symbols and source code.
- Bret Taylor
Gabe: I've been doing a fair amount of tracing through MySQL core dumps over the past week, and just having the ability to look at the code, plus the size of the community working with the same code base and debugging problems, seems to make it worth the cost at this point. We haven't changed any of the MySQL code yet, but Google and others certainly have.
- Jim Norris
I'm skeptical of Oracle and other commercial pre-packaged systems because the companies are focused on extracting revenue via software sales and their products are highly optimized for a certain problem domain that doesn't really fit our experience very well.
- Jim Norris
So I guess the downside is that if you change your indexes you need to change the code, too. How do you manage the table creation, code migration and data sharding all at once?
- Nick Lothian
Nick: It is actually not bad: make the tables, update the code, start writing to the indexes for new entities, then run the "Cleaner" to fill in the indexes for the older entities. When the cleaner is done, you can start using the indexes for features.
- Bret Taylor
The most important decision about your design- and why not going with databases RDBMS etc. to manage the indices - is that your users don't really look up the old stuff, so keeping the old entries in RAM/live indices isn't worth the time hit it would take to do huge indexes on long tables. This seems like a "running index"- that is, it's populated by the application in many places at one time, instead of in one place, that the DB then optimizes over the entire history.
- anna sauce
Reminds me of some multi-user java apps back 10 years or so ago that had to manage simultaneous users on live systems. great post, Bret, and fun to see what's working behind the scenes.
- anna sauce
"if you change your indexes you need to change the code" is not a problem unique to this system. You always have to update your code along with your schema. You have to write code to read from the tables even if you use a typical RDBMS, and you can't do that without an index, so this staged process exists in some form no matter what your storage scheme looks like.
- Bret Taylor
Bret, it sounds like MySQL crashes a lot. Are you assuming that commercial products will crash just as often and the vendors won't debug it?
- Gabe
Gabe: Every piece of software crashes, whether it is written by Oracle or by open source developers. The difference with open source is that we are not dependent on someone else to diagnose and fix the problems. This is not a unique sentiment. Google, Yahoo, Facebook, FriendFeed, Twitter, et al, have all chosen open source infrastructure for this and a variety of other compelling reasons we have already discussed.
- Bret Taylor
Brett, I agree that not being dependent on somebody else is great. In fact, the product I work on is probably 99% open source or developed in-house. I even replaced a relational DB with a pickled Python object store a year or two ago. However, I'm also not considering writing my own transaction protocol, indexing, and query engine because the open source DB I use doesn't support horizontal partitioning or online reindexing. On the other hand, when faced with Oracle, I might consider just engineering all those things myself.
- Gabe
Re: "change your indexes you need to change the code" - yes, conventionally you need to change your code if you change your tables. But you do get some opportunities for runtime optimisations by changing your indexes (although in practice this can have availability costs as you note)
- Nick Lothian
OK I'm going to put this out there: sounds like you're re-inventing the wheel.
- anna sauce
Being able to open something up to understand, diagnose, and fix things yourself is underrated.
- Amit Patel
DeWitt, part of the reason we didn't use a simpler store is that we're using mysql replication. We'd have to replace that too. Probably not hard, but it's working well for us. Also sometimes you do want transactions for performance within a single DB (each xsaction is a single log write, vs multiple little writes).
- Private Sanjeev
re: SSDs and mysql. I can't speak for all SSDs, but the Intel X25-M gets internally fragmented very quickly, reducing write performance by 10-20x. Things that do large sequential writes like bigtable or lucene are a better fit.
- Private Sanjeev
Part of the purpose behind traditional schema driven designs is to 'protect' the data from the programmers and the applications - which is not insane given the level of commitment people have in some kinds of organization. One of the statements that the FF people are making here is that they have trust in themselves and each other and they care about what they are doing.
- Robin Barooah
You guys rock so hard it HURTS! THANK YOU, FRIENDFEED!
- Josh Haley
Just logged in... this sounds really cool.... hmmm :)
- Susan Beebe
great post -- liked the discussion in the comments too! some people thinking relational databases can do everything; other people not understanding rationale behind key/value store; other people mad you've reinvented it using MySQL... :-D favorite comment from guy who is having a heart attack because anyone might think it's a good idea: "it is like using a database to solve the problem where the database is the problem" LOL tiny nugget of truth there. though I once built a Lisp interpreter in BASIC, which was "like using a language to solve a problem when the language IS the problem..." ;-) just the IDEA of it gives you conceptual indigestion, though it's hard to argue with working code.... lol
- Karim
what is the correct name for this? "schema-less data?" "entity attribute/value (EAV) store?" "key/value store?" personally, i like "property bag" :-D also, have you considered Amazon SimpleDB, or Microsoft SQL Data Services for hosted services, since they are based on similar ideas, just not on MySQL?
- Karim
Sanjeev, I heard rumors about the X25-M performance running down after fragmentation, but last I heard was Intel was unable to replicate & "looking into it." 10-20x should not be hard to spot. :-) Are you in touch with Intel...? Also, have you seen similar problems on SSDs with Samsung controllers? (I went with the Corsair...)
- Karim
Karim, it may not get as bad with other SSDs, but they start off with much higher write latency for small random writes than the X25-M.
- Private Sanjeev
The Samsung, you mean? All I was trying to do was *not* get the JMicron controller and *afford* it :-D
- Karim
but why are the ssIDs varchars, is the question (from @eonarts comment)
- anna sauce
anna: what are ssIDs? If you mean the UUIDs, they are just 16 byte binary strings.
- Bret Taylor
Just look how many comments on this begin with someone's name. "anna:" "Karim," "Sanjeev,". Somehow there needs to be a reply-to-comment mechanism. Fortunately FriendFeed uses json blobs in their database so this won't require a painful schema change ;-)
- Kevin
Bret: you're right, I re-read the post and couldn't find evidence of the Primary Key Erin mentions in her comment, as varchar. Am I missing it?
- anna sauce
schema changes aren't painful, and I have a question for more current DB/programmatic people (than me): why is this called schema-less? There is a schema, it's just that the database application isn't being used to index, Bret's crew are doing it manually. Blobs in columns doesn't mean it's schema-free, that's been going on for 15 years. I think it's just that he's cracking the indexing methodology and offroading. I'd like to hear from some PostGres people on this- Disqus is on PostGres aren't they?
- anna sauce
Anna: that person is talking about the URL index. What we are doing is correct for this application; we want the index to be in that order on disk, not in a different order.
- Bret Taylor
I just remembered that one of my client's DBs is actually Btrieve, which is nothing more than a way of storing binary blobs with certain byte ranges indexed. (They are still using a DOS app from the '80s, with the Btrieve running on a Novell server.) The only real difference is that they have fixed-length keys and records, while the MySQL data is variable-length.
- Gabe
Bret: Thanks a lot for the informative post! Any thoughts on MySQL vs Postgres? We found tables crashed with too many simultaneous writers in MySQL, but your post says it "doesn't corrupt data", so I guess you haven't had such issues.
- Ruchira S. Datta
Bret: Do you apply the same approach to perform searches on your stored text data? (i.e. inside the data storing the BLOB)
- Roger
Bret, do you care to explain briefly how you approach the following/follower aspect without JOINs? I loved the article, keep reading it every once in a while...
- Jorge Escobar