"Think of it this way: When you go to the movies, you don’t go around to each theater to see which movies are playing and when; it would take all of your time and effort running around from theater to theater. Instead, you check the kiosk out front.
Your blog publishing system provides a RSS kiosk, or ping feed, to let FriendFeed (and potential RSS readers) know when and what has been updated since its last visit. Friendfeed doesn’t have to go theater to theater to see which movie is playing. It also checks all RSSs in a domain at once, eliminating the need to download each one separately. Polling is less frequent, but more accurate. By cutting out a lot of wasted data transfer, it reduces the load and gets the relevant information directly." - Paul Buchheit via Bookmarklet
So we need an index for the RSS feeds. Who's gonna post the index of indexes? - Bill Sodeman
I clicked on Paul's link to the code, but it was hard to read because the whitespace was collapsed. Looking at the source it seems that they're nesting the code within tables within <pre> tags. Since <pre> can't contain tables, my browser is ignoring the <pre>s. Why would they do that instead of just putting style="collapse-whitespace:pre" on the table? - Gabe Schaffer
No, Paul, I have a modern browser: IE 6.0sp2. - Gabe Schaffer
I don't know if I'd call IE 6 a "modern browser", considering it was initially released over 7 years ago... in any case you can view the source in plain text here: http://simpleupdateprotocol.go... - Simon Willison
I wouldn't call IE6 a modern browser either, but 14% of the traffic to code.google.com still comes from it, so it does need to be supported -- I'll open a ticket to investigate the issue (I don't have access to Windows right now to confirm). Though @Gabe, since you're a technically inclined person, I'm curious why you are running IE6 rather than FF3 or IE7. - DeWitt Clinton
Most of my clients are still running IE6, so I know that when something works on my machine, it will work on theirs too. Besides, I haven't really found a feature in IE7 or FF that is worth upgrading for. And honestly, I would consider it a bug for a browser to render a block element like a table inside a pre (which only allows certain inline content), particularly on a page that specifies XHTML Strict. I expect the browser to either ignore the pre (like IE6) or ignore the table. - Gabe Schaffer
Cool, thanks for the feedback Gabe. I opened a ticket here: http://code.google.com/p/suppo.... Please feel free to add more detail to that ticket and star it so you can follow the progress. Cheers! - DeWitt Clinton
Thanks Paul. It's definitely an issue that needs to be dealt with. - Daniel Shaw
Since I can find colliding user IDs easily with MD5, is there any risk of a DoS attack? - Steve Weis
I'm not sure what you're asking Steve. Each site generates their own SUP-IDs, so they would be DOSing themselves. - Paul Buchheit
ok plea to all bloggers- please stop using "modest proposal"- you're referencing Swift who was writing satire, about killing Irish babies to prevent/deal with famine. Almost all blogs with this play on the phrase "modest proposal" are not being satirical... so it's a reference without meaning.. I get confused easily, granted. And, not just this post, but there have been others... OK english major out. Thnx. - anna
Paul, maybe I'm misunderstanding the protocol, but I can pick a name that intentionally collides with someone else's SUP-ID. Is there any expectation of collision resistance? - Steve Weis
Steve, The service assigns SUP-IDs. It is in the service's interest to minimize collisions between the SUP-IDs it assigns. - Gary Burd
Steve, each SUP feed has its own SUP-ID space, so you really can't cause problems with other people's feeds. If that's not what you mean, please provide an example of what you have in mind. - Paul Buchheit
@Gabe Raymond's proposal sounds sincere, so it's not really playing off of Swift's joke. Sigh. I guess nobody reads Swift anymore. He was funny! - anna
Steve, IF the service assigns SUP-IDs as md5(username), then yes, you could theoretically get a username that collides with another. But: usernames are usually short, and so you probably won't be able to find a collision with just a few (tens of) characters. Also, as Gary and Paul said, the provider assigns SUP-IDs, and it's in their interest to assign them in a collision-free manner; at the very least, they can hash a secret salt value together with the username. - Tudor Bosman
I've noticed that there is definitely some confusion around SUP. People have said comments like "I'll add it to my blog soon", but I don't think they understand that doing so could actually send MORE traffic to their personal site than they were getting before (from FriendFeed). I think it might help to clarify that this is really useful for large providers like YouTube, blogger.com, Twitter, etc. Personal WordPress installs won't likely benefit. Please correct me if I'm wrong :) - Patrick Lightbody
Patrick, for personal WordPress installs, it would be better to use a shared SUP feed. I'll probably write one when I get a chance. - Paul Buchheit
SUP is also good for providers that aren't big yet but expect to get big, like OurDoings. - Bruce Lewis via fftogo
There are significant differences between SUP and the Six Apart update stream: (1) SUP documents are fixed length. SUP clients periodically poll the server for new updates. Six Apart streams updates to the client. (2) SUP documents contain opaque feed identifiers. Six Apart sends a stream of Atom feed entries. - Gary Burd
SUP can also be used to monitor for updates on any URL regardless of content-type using the X-SUP-ID HTTP header. You are correct though that it's essentially a compact, standardized, generalized, discoverable form of an update feed. - Paul Buchheit
As SUP is basically a "service-to-service" behind the scene protocol, streamed version with persistent connections seems to be more interesting. Also, SixApart streams contain the actual data and stream consumer does not need to fetch individual feeds. - Alex Kapranoff
Alex, streamed is less efficient because it delivers data that I don't care about, has privacy problems (what would Google Reader Shared Items stream?), and is much more complex to implement (try doing it in PHP on shared hosting). - Paul Buchheit
The first and the second things you mention are problems of the SixApart implementation which is not a standard of some unchangeable kind. One can easily add filtering to the Atom Stream protocol and probably do something with the secret URLs. Implementation for small shared hosting based sites is, on the other side, nearly impossible, yes. But why a small site would want to implement SUP? As far as I can see it should be interesting for moderately big services with thousands of feeds. - Alex Kapranoff
How do you propose that Google Reader should implement this Atom streaming in such a way that it would preserve feed privacy? My PHP on shard hosting example is somewhat extreme, but PHP is actually very popular, as is Rails, which I think would also have some difficulty managing endless streams of updates. The SUP design is very much line with how everything else on the web works so that it can be implemented everywhere -- endless streams of Atom data are unique to a single site I believe. - Paul Buchheit
I don't see how abstracting the username by a layer is going to help privacy. It should be up to the feed publishers on how to handle privacy. They can and should offer better privacy controls than an abstraction. Based on what i've read about this so far, I could easily write a script to crawl flickr or twitter and get every single users id and link it to their SUP-ID. I put it on a different comment, but maybe a POLL verb for http would work. - Shawn McCollum
Shawn, The id abstraction prevents crawlers from discovering the URLs of private feeds using information in the SUP feed. - Gary Burd
Private feeds shouldn't be in the SUP Feed at all. Obfuscation as a security or privacy model doesn't prevent, it just delays the inevitable. The SUP protocol isn't going to be universally accepted and no matter what, your going to have to support feeds the current way. Handle private feeds normally since without authentication it's really just hidden not private. I like the way FF allows you to regenerate your api key, something like that should be used for private feeds rather then a id and password. - Shawn McCollum
But Shawn; the SUP ID in the private feed means nothing unless you know the feed it belongs to. And if you know the feed it belongs to then you deserve to know it (ie someone registered it with you). Private feeds and SUP get along just fine. I suppose you could add another layer on top by generating a unique SUP id for each feed for each client. So the SUP id only means something to the client who requested it. But that's a lot more work for the server generating the SUP feed. - Benjamin Golub
Shawn, the advantage of SUP is that private feeds can still be protected by whatever mechanism you choose (such as the FriendFeed remote key), but still feed update information into a public SUP feed because the SUP-ID is completely opaque. (you can't discover the SUP-ID unless you already have access to the feed) - Paul Buchheit
another thought - why re-invent? Did they look at www.sitemaps.org? - Dave Hodson
It doesn't mean nothing, it mean that there is something you don't know. I can do alot with "knowing I don't know something". - Shawn McCollum
I think the SUP-ID concept is interesting but over architecting a solution for a small subset of the issue your trying to solve. It's nice that SUP-ID works to limit the size and help with private feeds. You could also implement something like the base html tag for size. Then either separate private and public updates with the private ones using SUP-ID or use a marker to identify that the backend needs to go through an extra hoop on this one. - Shawn McCollum
"It doesn't mean nothing, it mean that there is something you don't know. I can do alot with "knowing I don't know something"." Fill your SUP feed with random bogus SUP ids and bogus data. Then there will be *a lot* you don't know (including how many accounts the service *actually* has). - Benjamin Golub
Shawn, can you give an example of something you could do, knowing that an unknown feed within OurDoings was updated at a specific time? I don't see the vulnerability here. This is a real question as I already implemented SUP. - Bruce Lewis via fftogo
Just so everyone knows, I'm not trolling and I love friendfeed. I have personal not professional interest in speeding up feed aggregation. Just chatting so... Loading up random data will have a negative effect on the size feature of SUP and will cause more processing to be done by the consumer of the feed. Bruce, I'll put something down to answer your question in a bit, but right now i've got to pick up my son from daycare. - Shawn McCollum
It's funny you mention Netflix personalized feeds indirectly in your blog post, Dare, as I'm not sure a basic SUP feed would work help much for our feeds. The personalized Netflix feeds generate generate about 6 million posts per day (about 2M each of queue adds, shipped DVDs and received DVDs). Given that any given feed consumer is likely only interested in a small fraction of the 8.4M+ subscribers, the signal-to-nose ratio in a SUP feed would be quite low, unless you created a SUP feed per consumer. - Michael Hart
Michael, SUP is intended for large feed consumers, not people monitoring one or two feeds. If you are doing 6 million feed updates per day, and your SUP feed compresses to about 8 bytes/entry (as the FriendFeed SUP does), then that works out to about 555 bytes/second (or 33k/minute). A compressed netflix feed appears to be about 22k, so the breakeven point is around 2000 feed fetches / day. For clients who poll feeds every half hour, that is only 45 unique feeds. For those that poll 1000s, it's a big win. - Paul Buchheit
Of course the math is different with if-modified-since, but from what I can tell netflix does not currently support that. Also, if you want to be a little more clever, the size of the SUP could be reduced substantially by only including info on feeds that have ever been fetched, since the majority of netflix feeds have never been accessed. You can also have separate SUP feeds for queue adds, shipped, and received, since some clients (such as FriendFeed) are only interested in one category (adds, in our case). - Paul Buchheit
And of course that's just the bandwidth savings. Using SUP would also reduce feed latency and the overall number of requests. - Paul Buchheit
“This may be a dumb question, but for RSS, what element does the SUP link element need to be a child of? If it's the channel element, is there a danger that dumb RSS readers will treat it like a regular link element?”
That's what we did: http://friendfeed.com/api/feed... Hopefully dumb RSS readers would ignore the link tag entirely since it's from Atom (and it's in the Atom namespace). If you encounter any issues with this, let me know. - Paul Buchheit
There's a non-atom link element in RSS 2.0 according to an example on http://search.yahoo.com/mrss so I'm going to be paranoid. I know there are programs out there that show even more ignorance about XML namespaces than I do. I'll just use the HTTP header. I notice your docs have it in all caps even though HTTP headers conventionally use initial caps...or is that because SUP and ID are acronyms? Anyway I went with all caps. - Bruce Lewis
“Seems like an interesting idea for taking sips from a firehose. Quick bug report: The 'available_periods' key in the sup.json feed lists URLs with /api/sup - which is a refreshingly RESTful / hypermedia-ish pattern, but they're all 404. These should probably be /api/sup.json”
Well, I wanted to share in fact the original Paul's entry where the discussion is taking place, but I don't seem to find a way to directly link to it. Sorry. - Alex Popescu
The "via Reshare" link points to the original entry, though clearly we need to make that link more discoverable. - Paul Buchheit
I think it would be valuable to have it stand clear, as without your hint it would have taken me a while. In fact, I was planning to look at the page source to extract the entry ID and post the link :-). - Alex Popescu
Paul - so YouTube would have in their SUP feed the X "recent" SUP IDs that have changed (due to X users clicking Favorite "recently"), and then for each SUP ID the consumer would know to map that back to a real RSS/Atom URL to do the actual fetch. Is that right? - Patrick Lightbody
If so, doesn't that mean that the client's understanding of "recent" needs to match that of the server's (ie: 3 minutes)? Otherwise I'd imagine you could end up with excessive polling for those SUP IDs that are in the SUP feed, right? - Patrick Lightbody
My understanding goes in the same direction. I suppose SUP would not be very useful in case there are lots of updates on the service side belonging to lots of different feeds. - Alex Popescu
It would be interesting if a 'protocol' would be created: client: 'can you please create a SUP for the following feeds', service:'here is your SUP', client: 'thanks. I'll GET it every 3m'. I'm afraid that this will need some kind of agreement between the services, but definitely both will benefit from it. - Alex Popescu
Gary - got it, thanks. I just looked at the FF SUP and saw the time periods as well. So that allows the client and server to sync their definition of "recent" and then of course be much more efficient. I like it - simple and effective! :) - Patrick Lightbody
"For example, if a site such as FriendFeed switched from polling feeds every 30 minutes to polling every 300 minutes (5 hours), and also monitored the appropriate SUP feed every 3 minutes, the total amount of feed polling would be reduced by about 90%, and new updates would typically appear 10 times as fast." - Paul Buchheit
That's a very interesting idea! I think that for the case of push-generated feeds it will show nice improvements over the current polling approach (which is definitely not scalable). I am wondering if there would be a way to employ the same idea for poll-generated feeds (feeds that are retrieved on request only) though. - Alex Popescu
That's correct Alex. SUP works very well for most common feeds, but it's not ideal for more dynamic feeds such as a search (e.g. http://friendfeed.com/search?q... ). However, the vast majority of the feeds consumed by FriendFeed and others map into the SUP model very easily. SUP does not solve all problems, but it provides a very simple solution that should work for 90% of feed publishers. - Paul Buchheit
Alex, a conditional GET applies only to a single URL. SUP allows feed consumers to simultaneously monitor many thousands of feeds with a single GET. - Paul Buchheit
I've told you I might not be fully functional :-). You're right SUP is a container for updated feeds. Should I post any other questions directly to the room? - Alex Popescu
Paul this SUP technology is HOT!! I am totally awed by this disruptive innovative idea.... very impressive and incredibly brilliant!! wow!! - Susan Beebe
Your welcome Paul, you guys inspire the heck out me...American techie dream in real time...neat! - Susan Beebe
my flickr upload appeared much faster just now... are you guys using XMPP for flickr? - Travis Parsons
and written in .py :)- but if we throttle "generate_sup_update(db, 120)" and "SUP feed:
{"since_time": "2008-08-12T01:44:49Z", "period": 120," [[..|..]]" , so if we take "120" and make it lets say "30", wont this make the load even more to both sides ? - Peter Dawson
It's nice to see FF innovating things... its what I miss about livejournal back when it was just danga interactive. - Dave Dash
just curious, how to read SUP? pronounce sap or soup or syoop? - huixing
'sup, like the shortened version of "what's up?" - Tudor Bosman
So, where's the "omg it's not XML you idiots" backlash? - ⓞnor
Atom streams look more effective performance-wise and just a little bit harder to implement on both sides. See SixApart's: http://updates.sixapart.com/ - Alex Kapranoff
More than a little bit harder! Dealing with never-ending XML streams is a massive pain (see: XMPP), and keeping connections open is trouble. Also, the sixapart updates stream is a firehose that gives you all of the content being posted, you have no opportunity to filter out only those feeds you care about. The FF design is pretty much totally more awesome. - ⓞnor
work with feedburner to give you a ping every time one of the feeds changes and you can replace 5h with 'whenever it occurs' ;) - Nicole Simon
I can't wait for a DUDE or YO companion protocol. - abacab
Nice idea, one thing to include would be the information if a resource (feed) has been deleted, whereby one can build a mirroring system over RSS. - Christian Sonntag
Christian: no need; "deleted" is a special case of "updated". If a feed is listed as modified in SUP, the feed consumer will try to refetch the feed, and notice that it no longer exists. - Tudor Bosman
"FriendFeed Inc. is enhancing its service in order to fulfill a critical requirement on the Internet today: immediacy. The highly publicized startup is weeks away from boosting the frequency of its updates from social networks" - Louis Gray via Bookmarklet
What does SUP (Simple Update Protocol) do exactly, it's more than a http header last-modified check? - Philipp Lenssen
I'm interested in learning the details, too. my guess is some type of callback scheme that was discussed on ff a while back when ff was (gently) called out for polling flickr millions of times/day. - David Vasileff
also perhaps batching multiple feed requests into a single call - David Vasileff
That's it ... no more hikes in the afternoon! (Sharing .... ) - Charlie Anzman
theory 1: a single "meta feed" which you can poll to get a list of other feeds that have changed recently. (would it cover all feeds on the service, or would FF somehow supply a list of all the feeds they're interested in?) theory 2: a callback/ping/PIMP notification when a feed or feeds change (HTTP? XMPP?). theory 3: a formalization of the "public feed" concept, where you roll every (public) update on the service into a single (rapidly rolling!) feed which FF polls and gets updates for. - ⓞnor
Someone has to talk to someone to indicate a change occurred, so you can't skip that step, be it push or pull. So my guess is it's a way to get a larger aggregated chunk of what has changed and an idea of the size of the change. What might work is a bulk push of what has changed and then a pull of the changes at FF's leisure. - todd
Providing facts is just cheating. Now there's no room for rampant speculation :-) Add a sequence number and you could know if you missed an update which would indicate polling needed to occur or perhaps a download of the old change notices. - todd
great stuff. imho, when adopted, SUP will be - to the organic growth of services updates - like traffic lights to crowded intersections. or like how gps navigation is to asking people for directions :P - Dani Radu
Sounds like it's a protocol that others will need to implement and support so friendfeed can process feeds more efficiently. alot of the issues could be fixed if if-modified-since was used and rss feeds were treated more like a web service rather then a static html page, most are generated from a db real-time anyway. Read RFC 977, NNTP fixed this issue by setting up an easy way to poll what's new back in the 80's. - Shawn McCollum
WOWOWOWOW...this technology is huge! Disruptive and fantastic!! If I had VC level cash, I'd throw it at FF brainiacs and be rich... this idea is unbelievably SMART! - Susan Beebe
Shawn, we actually already use If-Modified-Since and many sites do properly support it. The key difference with SUP is that it allows feed consumers to monitor many thousands of URLs with a single GET, which is not possible using If-Modified-Since. - Paul Buchheit
If anything, it shares a few similarities with Sitemaps (which enable webmasters to notify search engines of modified URLs, etc.) Great work on SUP! - Aviv
This is one of those simple ideas that one wonders why no one thought of before. It's a good proposal and a required one in the rapidly growing aggregation/Lifestreaming world. I am sure the proposal will be widely and quickly SUPported. Well done folks. - Vinay | विनय
paul, I found and read the ff blog post, and I understand the meta-feed approach. Interesting but I think sup-id storage adds a bit of complexity. Even though the sup-id keeps the size of the feed down, full uri would be better. I think the concern about exposing usernames is a little overdone, I mean it's not going to really stop someone who wants the usernames from getting them. - Shawn McCollum
Love it. We've used a different approach to interface with "friendly" crawlers, one that is based on the ability to fetch older items in the feeds by request. But it requires a public "recent" feed, and does not work solve the private URLs issue. This is so much better. We'll be experimenting with SUP and would love to help it mature into a well defined spec. - Yaniv Golan
"At 13:07:25, I crossed the finish line, much to the delight of the crowd who had been loving my Obama jersey all day. (Note: while wearing Obama schwag is guaranteed to draw fervent support from spectators, expect to be required to dispense at least a thousand fist bumps.)" - Scotty Allen
This is from 4 days ago... Bolt ran that day, but it was just a heat. I'll have pictures up of last night's 4x100m relay soon... - Ana
ana's captioning productivity has gone up 1000% since i left! - eviltom
I wonder if the cameramen have to train to be fit enough to run alongside the track&field olympians on their victory laps. Their equipment doesn't exactly look light. - Dan Hsiao
Yes, I took the movie. I used the camera that I loaned to you and Ross for the whiteboard movie. @jbeda was also there. Here's his movie: http://www.youtube.com/watch?v... - Gary Burd
That is super cool! How close were you to the dolphins? Do they react to or avoid people? - Dan Hsiao
Some dolphins swam within 10 ft of me. I assume that the dolphins were curious about us because the dolphins swam to us. - Gary Burd
not entirely sure that the creation of a stable black hole would require earth saving intervention. but if i were an alien i'd definitely make out that it did - what does this tell us about kevin? - Alex Gawley
Huh? This is the kind of video my 15mo old would LOVE. (And, BTW, this is exactly the kind of truck that comes to our house every week) - Steve Lacy
hooked up my youtube to friendfeed and voila - you notice that my favorite YT videos are the ones I bookmarked b/c my kids like to see them (over and over) - peter
The use a master-slave architecture, all writes go to the master datacenter in California. They modified MySQL replication so that they can invalidate datacenter-local memcache caches once the data gets replicated. And finally, they use a timestamped-cookie to solve the problem that read-after-write should see what you just wrote, even if you hit a data center that hasn't caught up on replication, yet. - Sebastian Kanthak
"For between £200 and £2,000, people can buy a cow that stands no taller than a large German shepherd dog, gives 16 pints of milk a day that can be drunk unpasteurised, keeps the grass “mown” and will be a family pet for years before ending up in the freezer." (via boing boing) - Joe Beda
Are mini cows better for the ozone layer? - Gary Burd
"Interestingly, Apple has pulled the push notification service in this release "for further development." The capability was announced at Apple's Worldwide Developer's Conference Keynote in response to requests for background process support for third party iPhone applications. Applications that deal with messaging (such as AIM or Facebook) would likely stand to gain the most from the SDK enhancement. Apple had promised a September delivery of the functionality." - Gary Burd via Bookmarklet
I want to assign a user a random number (lets say from 0 to 10,000) - but I want to make sure that it is a unique number, no other user has that number. Presumably I'm storing all taken numbers in a database. How do I most efficiently make sure a new number is a unique number? While the set is still sparse, I can just assign a number, and see if it is taken, and then try again until I find an untaken one. But as the set of all numbers approaches being full, this will approach 10,000 tries just to find that last untaken number. Is there a better way? - Shannon Bauman
Hm... thinking about this some more. Perhaps if I just stored all of the untaken items in a list (is that the right structure?)... then do a random role of rand(list.length()), and go to that spot in the list, and then use that number stored there, and remove it from the list. That may work. Not sure how that equates to a database though :-) - Shannon Bauman
Just talking to myself, don't mind me :-) I may have to just have a continuous process running that does the assigning of IDs, and always stores the unused numbers in it - and then just hit up that process whenever i need a new ID assigned. I could then add that number to the DB for bookkeeping, etc. (I need the resulting number stored in a DB for other reasons/Longevity). - Shannon Bauman
It sounds just like dealing with hash collisions... one easy approach is to pick a random number and then scan for the next available number after that (modulo table size). Linear scans should be fast if the database stores things in sorted order. - Jim Norris
@jim - yeah, that totally makes sense. Thanks! - Shannon Bauman
How about just using a larger number? (like perhaps 128bits) - Paul Buchheit
@paul (and jim) - I am actually trying to completely fill the namespace so to speak (or numberspace as the case may be in this case). I basically want to keep assigning people numbers as they come in the door until all 10,000 numbers have been taken, and then close the door. But ideally I would like to do the assigning out of order. Perhaps what I should do is just do the shuffle before hand all at once, and then assign them from first to last after they have been shuffled? - Shannon Bauman
By the way - it is pretty amazing for me to be able to post a CS question, and have the two smartest CS peo