Now I'm wondering if the userids were really handed out sequentially and without gaps. Mine is 673483, @miyagawa's is 731253, @dalmaer's is 4216361. Could nearly 1 million people have signed up for Twitter before I did? Or 4 million people before you did? I created my account 2007-01-21, @miyagawa on 2007-01-30, and you on 2007-04-11. Were there really 3 million new accounts in 3 months?
- DeWitt Clinton
I suspect that they allocated the ids in large chunks across sharded dbs, and never reclaimed the unassigned ids.
- DeWitt Clinton
Whether or not my hypothesis is true about why it happened, I think we can easily demonstrate via random sampling across the range 0..current_max_id that the current_max_id is a poor indicator of the actual number of accounts.
- DeWitt Clinton
Wrote the script. Very *very* interesting how sparse it is. Running a statistically meaningful sample now.
- DeWitt Clinton
Yup, this is going to be good. The Twitter API is rate limited to 100 per hour, so I will need to trickle in data slowly. I'll let it run all night. Hmm.
- DeWitt Clinton
Finished up the post sampling Twitter data. Will publish it tomorrow AM. Neat stuff.
- DeWitt Clinton