Sign in or Join FriendFeed
FriendFeed is the easiest way to share online. Learn more »
Robert Scoble
Some stats from Twitter conference compared to Google:
Twitter is seeing about 200 tweets per second, during peak loads. - Robert Scoble
Twitter is seeing about 10gigs of new data created every day. - Robert Scoble
Google is seeing 4 billion queries per day. How many is that per second? 46,296 - Robert Scoble
Whew, that's a lot of queries. - Robert Scoble
I think the 4B is just API calls. That's not counting, you know, actual search queries. :) - Matt Cutts
200 tweets per second is a lot less than I thought. - Adam Jackson
Matt: that's wild. - Robert Scoble
if twitter grows to the size of facebook i.e 200 million so 10 times... thats still only 2k per second... - Robert O'Callaghan
Adam: that's according to people who are getting the firehose feed. - Robert Scoble
I assumed it would be more than that, too. - Robert Scoble
Ah ok. Which, to my knowlede is just google, friendfeed and just a few other "partners" do we know who is a firehose partner? - Adam Jackson
But, looking through the data it seems most people don't tweet very often. - Robert Scoble
And that is because Twitter is just getting to be popular in the rest of the world, wait to see what is coming.. - Julian Flores
sort of weird comparison, tweets are writes and queries are reads - Kiran Patchigolla
Adam: we don't have a comprehensive list, no. There are others, though. - Robert Scoble
And are firehose partners getting ALL tweets? - Julian
Robert: have you spoken to Nick from tweetmeme - he has some real good stats regarding rt's, data growth over the last 2 years etc - Robert O'Callaghan
Kiran: yeah, it's not a good comparision, to be sure. But if there's only 200 tweets a second I seriously doubt that Twitter search is seeing many people hit it. - Robert Scoble
Robert: nope, I need to, though. - Robert Scoble
Julian: according to the people getting the firehose feed, they are supposed to be getting every Tweet. - Robert Scoble
Robert: not my blog but some stats from Nick at @devnest http://dalelane.co.uk/blog... - Robert O'Callaghan
The people who are getting the firehose feed also say that it's very difficult to deal with the data flow at the level it is today (and they say that even Twitter isn't doing very well at it, look at how bad Twitter search was last week). THey all are wondering how they will deal when Twitter's traffic is 100x what it is today. - Robert Scoble
Robert: this is what I love about friendfeed. The post gets better over time because of everyone's participation. Thanks! - Robert Scoble
10 Gigs of data per day is not much. - Louis Gray
Robert: yes I love FF too :-) Found the slides from Nick http://www.slideshare.net/nickhal... - Robert O'Callaghan
Yes that's what I don't understand Robert, 200 tweets/s x 160 chars (including headers) = 32kB a second. This is not difficult to deal with, surely (+XML / JSON overhead, still tiny amounts) - Julian
Louis: it's not much, but it's all text. The real struggle that many of these companies have is with photos and video and other data types. It's very expensive to deal with all this data. I wonder if we could decrease the cost of hosting and dealing with it all by sharing the data in some way? - Robert Scoble
Julian: it's not the per second amount that's difficult to deal with. It's that the size of databases keeps growing. Remember the guy who bragged about having 800 million rows in his database? - Robert Scoble
Now, what happens if you need to resort your database? Or do something else funky? - Robert Scoble
Robert, you're asking the right questions. The biggest growth areas in data today are in files and rich media - including photos, videos, etc. We're all creating more and more, but nobody is deleting. - Louis Gray
It is expensive to store the data and to transfer the data. Networks have gotten larger, disks have gotten larger (and cheaper per GB), but the disk speeds themselves are not increasing, and servers are largely processor-bound, so you see low utilization rates. - Louis Gray
Louis: yup, and the folks I talked with say that if you want really fast response like what friendfeed has you've gotta pay for expensive SSD devices for your datacenter. I don't know if that's absolutely true, but it sounds reasonable, especially for systems with lots of databases and lots of indexing and lots of reading. - Robert Scoble
there's a new o/r mapper available to Java programmers that lets them write programs in a typical relational-backend fashion, but behind-the-scenes large files are transparently stored in Amazon S3 cloud storage and the database rows are stored in Amazon SimpleDB. - Brian Hendrickson
How about adding friendfeed stats here too.. (if somebody knows already!) - Jigar Mehta
Louis: but I wonder if that data will have worth in the future - mining photos for example for data about your friends, family and holiday destinations. - Robert O'Callaghan
Robert O: A significant amount of data is infrequently accessed. And as Scoble is saying, you are looking to have SSDs at what's called Tier 0. The best enterprise storage devices have multiple tiers of disk and automatic policy-based data migration between tiers from high performance disk, like SSD and Fibre Channel, to high density SATA. - Louis Gray
But if you assume data will be there, most people won't mind having some latency on data retrieval for older information, so slower SATA (like in your laptop) is just fine. - Louis Gray
Maybe it's just me, but speaking with people around the queries issue made me realize these are still big numbers for most average sized companies. You might think that giants like Google, Microsoft or Facebook can easily take care this amount of requests but many other small startups would probably find it difficult to handle. - Nir Ben Yona
Good point Robert, 200 tweets / s = 6.3 billion rows for one year of tweets. But still surprised this is an issue these days. Anecdotally, even MySQL can support billions of rows. - Julian
That's an interesting Google datapoint, it explains the aircraft hangers full of servers. The scaling challenge for Twitter however is less related to 200 tweets being posted per second, more about all those Twitter clients hammering their API trying to get them out in real time. Firehoses aside, does anyone know how many API hits Twitter gets per second? - Bob Hitching
Every incorrect assumption in this post seems to think that 1 tweet on twitter = 1 database row = "So easy!". You've left out the user fanout! One Obama Tweet = 1M database rows, someplace. - netik
Twitter bought a load of kit a month or so back - at @devnest we were told it was to do with search. People had noticed it had shrunk in size from year dot to only two weeks worth. Anyone know if it expanded back to the beginning or have they closed that door? - Robert O'Callaghan
Robert O: The data set of Twitter's Search can be as little as 4 days. Do a search on tweets "from Oprah" for example, and you will see none. - Louis Gray
Performance I doubt is an issue at this level. Remember the hadoop statistics? http://developer.yahoo.net/blogs... processing videos, photos etc is definitely a different ballgame - Kiran Patchigolla
For comparison, NASDAQ can handle over 35,000 messages per second and processes over a billion messages a day. I suspect they use something like the TIBCO P-7500 to do so. http://www.nasdaq.com/service... http://bit.ly/2TxxI - Steve Wilhelm
Thank you for all of this Robert, you are such an eternal giver. - Thomas Power
At a basic level we're talking about storage and distribution. Data is stored somewhere until someone requests it and it's then distributed. In this scenario there are at least two potential bottlenecks or problem areas. There is currently no infinite storage space and there is currently no infinite amount of bandwidth to distribute it. Plus it's a two way distribution network, we're pumping data in and we're pulling it out. Aircraft hangers full of servers are all well and good but the wired world is running out of bandwidth. Twitter doesn't appear to be using much of that bandwidth at the moment but other social media is, and a new toy appears seemingly every day. - Gilbert Harding
The best comparison would be Google's web crawler new page discovery rate to Twitter's new status rate. The read rate on Twitter is many orders of magnitude above the write rate-- the comparison of Google QPS to Twitter API calls. Furthermore, the total Twitter user-driven write rate is much larger than the new public statuses rate, which is what the firehose represents. Think of all the changes to the social graph, for example. That's just the front end. On the back end, each new status triggers two orders of magnitude more writes due to the fanout... - John Kalucki
No one in their right mind would use a relational database to store all this data - the biggest issue is that you have SO MANY synchronous writes as well as index and key changes per second. A stream-fed architecture is ideal for this, and 200 inputs a second is easily parseable. TIBCO is an exampleof a very heavyweight version of a streaming service, as Steve Wilhelm mentioned above. - David Sifry
This uncovers the secret to what may be twitter's eventual financial success and ability to resist being acquired. I'm reminded of the recent techcrunch article proposing that youtube would have been unable to survive as a standalone entity because of the enormous cost of storing all of the video data - keith kleiner
Louis: They've 'fixed' the from:oprah query, now - two tweets show up, both from the last 24 hours. http://search.twitter.com/search... This of course supports your point. - Shéa Bennett
Late to the comments but I agree with Louis, 10G per day is trival. The last network collection stream service I helped design was 24 Petabytes per work of text messages comparable to tweets across multiple protocols and transport technologies. - Ken Camp
I never would have thought that Twitter's high volume times would only create 200 tweets per second. - Diego Barros 
So what you're saying is that Rails really can't scale? ;) - Diego Barros 