Our data center, SVColo (http://svcolo.com/), lost power (and apparently all generators as well) this afternoon, causing our site to be completely unavailable for a couple of hours. We apologize for the extended outage.
This outage impacted our site as well as a number of other web sites hosted at SVColo. We are obviously fairly frustrated by the incident, and we are working hard to get everything else back online now.
- Bret Taylor
I was worried it was the End of Days!!!
- Rochelle
That's ok not your fault, great work and yes we did miss you
- Kim Landwehr
World productivity just went back down.
- Amit Patel
It amazing how lost I am without Friendfeed. Nice that you're back!
- Rahul Das
Glad you're back...and most importantly still in one piece. Do you anticipate any issues with feed importing? I just manually refreshed Twitter and all seems well.
- Mark Krynsky
Great to have you guys back. Lots of weblinks/posts from surfing the past 3 hrs.
- Mitchell Tsai
funnily enough this is not the first time i have heard of a datacenter with supposedly highly available N+1, A-side and B-side, UPSed & diesel-generated power just going totally offline because someone tripped over a plug.
- Karim
Yes, I missed FriendFeed! And what happened to the SVColo emergency generator plan? Jeez.
- AJ Kohn
YES we missed you! ;-) Glad you are back up, and that as soon as I was back I could look here and see the explanation. Things happen, but I love it that there never seems to be a question that you'll give us the low-down. Thanks! :-)
- guruvan (Rob Nelson)
And double thanks for the Tweet to announce the outage ;) (a tweet announcing restoral would be uber-cool)
- guruvan (Rob Nelson)
The IM bot doesn't seem to be up either.
- Rahul Das
I guess this will convince investors of the need for a second datacenter site
- guruvan (Rob Nelson)
Karim: the generators almost never work for anything short of telcos carrying 911 traffic..
- guruvan (Rob Nelson)
Cool, some people were saying it was a different power system, thanks for the update.
- Dan owns Comicsforge.com
Rob, that gibes with my anecdotal experience :-) but why is that? what is the point of telling customers you don't have single points of failure in your datacenter when you do...? mishegoss.
- Karim
Karim, there are ALWAYS single points of failure. If nothing else, Earth is a single point of failure :). As for the generator, supposedly that's what's powering things right now!! (which has me somewhat frightened)
- Paul Buchheit
Paul: very true....And Karim, it's probably because the generator backups are rarely tested, certainly not regularly, and partially for this reason...the failovers don't work and they don't want to take customers down because a test went wrong
- guruvan (Rob Nelson)
I hope, Paul, that this will cause you guys to be able to go to the investors and get things happening on two or three sites, so we only see performance degradations in future calamities (since disaster will always strike)
- guruvan (Rob Nelson)
This is what you get for not subscribing to me. Don't trifle again, Mr. 'Taylor'.
- Akiva
Fess up Bret ... this was just a test to see how starved we'd get without our hourly toke ;)
- AJ Kohn
If the colo facility isn't testing their generators on a weekly basis, someone needs to lose their job.
- Scoble, Alex Scoble
and I was gonna suggest that it went down because I was AFK... I guess I don't qualify for the 'I survived the great FF outage of '09 and all I got was this lousy t-shirt' t-shirt.
- grant fox
During the Twitter planned outage I thought, "well, at least I still have FriendFeed". Ouch! Glad you're back. I've lived through several outages on both sides of the equation - don't wish it on ANYONE.
- Robert J Taylor
Very sad test of how addicted I've become. I was jones-ing pretty hard. Glad you're back.
- Ken Gidley
Paul, yeah, we can't eliminate SPOFs completely, but i've seen these places brag about how they have N+1 power systems, UPS, diesel generators that can run for days at peak load, priority contracts to have more diesel fuel delivered if necessary, etc. etc., and something happens and the whole data center just dies. you realize that you might be screwed in the event of Global Thermonuclear War :-) but you don't expect to be down for hours because somebody pushed the wrong button...
- Karim
strangely enough, a transformer blew up outside earlier tonight and now my lights just flickered. going to make sure my UPS is charged :-D
- Karim
Alex: Where have you ever been that they actually tested gens on a weekly (or regular even) basis. I've been around colos and CLECs for many years, and this happens all the time because there's never a test
- guruvan (Rob Nelson)
and Alex: That's where I'm going to put my next project ;-)
- guruvan (Rob Nelson)
glad to have ff back. What will happen to updates which were made during that time? will they appear on my ff site?
- Okeane
Supposedly the generators worked ok, but there was a ground-fault elsewhere in the system that blew out all the breakers. Murphy wins again.
- Paul Buchheit
It is a fact of computing...sometimes the power just quits. IP is great - we can always rely on RFC2549 for a no-electricity transport protocol.
- guruvan (Rob Nelson)
Where you gone....I hardly noticed....OKAY! YES! Yes, I missed you! Satisfied?!
- WoH: Professor MOTHRA
My browser reload button is exhausted. But all systems are go now—thanks for calming all the passengers by simply telling us what's really going on—awesome. *climbs out of search.twitter dingy and back onto the mothership*
- Micah
guruvan: Most hospitals I've been to test their backup systems on a regular monthly schedule. Ironically, that makes the emergency power system in general less reliable than utility power.
- Gabe
So it that not a fail whale but a school of fail whales??
- Amani
Gabe: ok..never worked in a hospital, but most of them do in fact have proper working backup power to my understanding. why can't the telcos and datacenters? this is a very common problem for them.
- guruvan (Rob Nelson)
@Rob, not everything is that simple. did someone drive their car into the generators? :P
- mjc
Does sitting for 2 hours watching all mentions of FriendFeed on Twitter search qualify as missing you?
- Sharon McPherson
Have a fail-over version of Friendfeed on Google App Engine. I know it will take time and effort, but then it will mean built-in redundancy. And failing of both together will be a highly imbprobable event
- Varun Mahajan
Michael, I know it's not all that simple, but you would be amazed at how often this happens in professional datacenters and at CLECs and without something as understandably disastrous as cars through buildings or bombs.
- guruvan (Rob Nelson)
Rob: when I first met Rackspace's chairman, Graham Weston, we talked about a failure they had when a truck knocked out power there. Turned out the chillers and generators had a flaw in their design that kept the generators from kicking on for a few seconds. This sounds exactly like what happened here. Rackspace worked with the chiller and generator companies to design a fix, but he told me that most data centers haven't upgraded (Rackspace's have). I wonder if this is the flaw that hit friendfeed yesterday?
- Robert Scoble
I read somewhere that it had been mathematically shown that a computer program can never be 100% error-free because the error routines will have errors... Maybe it's the same type of thing here.
- Bob Morris (polizeros)
Robert: That's very interesting. Don't know if that's what happened, but I do know that most of the time when I've seen this happen in a CLEC CO it's been much more simple than that. Usually attributable to human error (and lack of testing). I am curious to how, and how often Rackspace performs tests. Finding the design flaw you mention speaks highly of the company IMO.
- guruvan (Rob Nelson)
Rob: they run tests on the power system very often. Unfortunately nothing tests a system like real life.
- Robert Scoble
No, there are factors that just don't come into play in a "controlled" test scenario, no matter how thorough you try to make the test.
- guruvan (Rob Nelson)
It was the severity and length of the outage that was the surprise - but we have had electricity outages over a vast part of the continent that lasted days and caused by small flaws so this isn't out of that realm. But this is or should have been a much smaller more controlled situation. I am guessing there may be some adjustments in contracts and some heads rolling. Who else was affected by the situation? I get the impression that svcolo is a fairly large operation.
- Brian Sullivan
Brian: if a data center goes down for two seconds it often takes an hour or longer to turn everything back on and get it all working properly again.
- Robert Scoble
In this case it was definitely "longer"
- Brian Sullivan
Very true...and often longer than that to get data systems working again, as they're often overloaded as soon as they come online (causing great difficulty in getting them to run correctly). I was very impressed with how quickly FriendFeed was able to have everything back to fully functional
- guruvan (Rob Nelson)
Brian: yup. I bet that the power came on pretty quickly, but bringing up a rack of equipment and getting everything talking to each other again can take quite a while.
- Robert Scoble
I guess they will have to adjust their power uptime claims to read 100% (except for strange circumstances) ;-)
- Brian Sullivan
no company anywhere should ever try to claim 100% uptime of anything. it's just not realistic. there's always some scenario that causes that to be off - if only by 0.0001%
- guruvan (Rob Nelson)
maybe time for a pv solar third-tier backup
- Matt Weeks
I've been with Hurricane Electric 5 years and never experienced significant downtime. Luck? Maybe. Maybe not.
- Jason Nunnelley
The good news is downtime seems an essential element of a social's' success.
- Jason Nunnelley
Brian, the SVcolo page you linked to is funny. love the part where they "assure 100% reliability" for the electrical power, with a "100% facilities uptime SLA." guaranteed to never fail or you get your next pizza free? guessing their hull is also 100% impervious to icebergs. +1 Phil. though day of week is also suspect. am superstitious about planned maintenance windows on Friday afternoons -- like standing in open field with a lightning rod during a thunderstorm. ;-)
- Karim
matthew, i thought the Tier 1 data centers sucked? :-D with diversity, it sounds like you are in a Tier 4...
- Karim
I think it is bound to happen with anywebsite specially when it is growing faster than expected.
- Ashish
ashish: it's bound to happen to just about any website. until a site is the size of google, yahoo etc, it;s not only possible, but probable that it will happen. -a definitely nice link there Phil :-)
- guruvan (Rob Nelson)
I think some people got frustrated as if they paid to use Friendfeed. I wonder what would happen if websites like this eventually start to charge the users or would they be ad supported. I think the newness of the social networking will eventually fade as that of email.
- Ashish
In a sense, a lot of us do pay to use friendfeed. We pay with our time, we pay with our words and we pay with our content.
- Scoble, Alex Scoble
lol Alex, I was going to mention, the centers that house banks come as close to never going down as possible. it's not unheard of, but it sure is the end of someone's world if they do go down
- guruvan (Rob Nelson)
you move gradually towards Livejournal path - they had same once big, and even became annual tradition... yes, power losses became regular once-a-year thing, no matter what datacenter they were to use. And then they sold itself to Russian SUP :D
- A. T.