SearchSMB Blog - A blog for SMB IT professionals.

SearchSMB Blog:

 

A blog for SMB IT professionals.


A blog for professionals at small and medium-sized businesses (SMBs), covering information technology (IT)-related news, features and advice.

First Windows takes down Skype, then eBay’s rep

If everyone on the planet jumped up and down at the same time, would the orbit of the earth shift? I doubt it. But I do know that if millions of people try to log in to their Skype accounts at the same time, it can knock the peer-to-peer Internet telephony service out of action for days. 

How do I know? Because that’s exactly what happened last week.  

For those unawares, Skype’s network crashed last Thursday, leaving its 220 million users without service for nearly two full days. The company, which is a division of the online auction giant eBay, initially blamed the outage on a “software issue,” while some bloggers speculated that a more sinister cause, namely a DoS attack, was behind the crash. 

Service was finally restored on Saturday morning, and today Skype’s Villu Arak explained on his Heartbeat blog what really took the network down: 

The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update. 

File this incident under “unintended consequences.” I can just picture the millions upon millions of Windows users dutifully downloading their weekly update, restarting their machines and expecting all to be right with the world. Instead, they unwittingly shuttered, at least temporarily, one of Web 2.0’s most celebrated progeny. 

For parent company eBay, its response (or lack thereof) seems to have caused more damage to its reputation than the service outage itself did to Skype’s network. 

On GigaOM, blogger Om Malik writes: 

And in this moment of crisis, eBay’s senior management was AWOL. Ebay and Skype management are happy to talk to the press when delivering the good news, but in this crisis situation, the silence was deafening. 

Perhaps more ominous for Skype’s future advances into the SMB and enterprise markets are comments like these, posted on the All VoIP News blog:    

After the recent Skype outage I certainly would not use Skype for my business. If Windows updates cause millions of people to reboot their computers, and thus Skype is effected, there is something wrong. 

Ultimately, I don’t think you can blame Skype for failing to anticipate such a fluke occurrence as a network outage caused by millions of users rebooting their computers after a routine Windows Update. I know I certainly didn’t see it coming, and I doubt anyone else did either. Internet telephony is still a work in progress, and snafus like this are bound to happen from time to time.  

But Skype users certainly have a right to expect a better response and better customer service when such a drastic event does happen, especially from a major IT player like eBay. For its SMB users especially, those who depend on Skype to conduct daily business, anything less can prove costly. Maybe too costly to take a chance on a VoIP service like Skype in the first place 

So in addition to “unintended consequences,” here’s hoping eBay and Skype also file this incident under “lessons learned.”

12 Comments »

  1. I feel that server problem happened thru out the world ,but we have no reason to complaint Skype because its a free service n if we are depending on free service then we have to always make a second option

    Comment by nicci — August 21, 2007 @ 11:18 am

  2. Seems lesson learned means they won’t have another 2 days outage again, therefore Skype can continue to be used for business.
    But free things do have a certain tolerance level, unlike my cell phone, where I do pay quite a bit.

    Comment by Tom — August 21, 2007 @ 12:29 pm

  3. Of course, every system fails at one time or another, whether it’s Skype or any other. That’s why we disaster managers teach both individuals and businesses to establish REDUNDANCY for mission-critical systems. Neither redundant system will provide 100% uptime for all time, but if they are selected properly (e.g., independent but complementary technology) they will also not likely ever both be down at the same time, either.

    There’s no reason for this incident to cause anyone, including SMBs, to dump Skype. However, it might make the SMB ask whether Skype should be their primary VOIP platform, or might make a more appropriate redundancy platform. That is where I happen to believe Skype belongs in the SMB’s redundant VOIP configuration - not as the primary platform, but as backup.

    Comment by Brian — August 21, 2007 @ 2:14 pm

  4. If the upgrades are timed, that would necessarily mean that the restarts were spread across a 24hour period. Not everyone lives in US and thus an 8am (say) upgrade takes place at 8am local. Unless, that is, the users are so inept as to not localise their time/date settings (God, I hope most are not that bad!)

    If the updates are MS sourced and happen at the time of release, then for many PCs it will be next day restart that triggers the upgrade anyway. Still no major problem. The difficulty will occur when corporate policy requires PCs to remain on overnight so local upgrades can be centrally managed. These may get the same time upgrades, causing a major restart.

    This problem may be bigger than just Skype, what about the sudden surge in power use as millions of PCs switch off and then back on in unison?

    Comment by John Attwood — August 21, 2007 @ 5:00 pm

  5. Read this…. http://heartbeat.skype.com/2007/08/what_happened_on_august_16.html

    The disruption was triggered by a massive restart of our users’ computers across the globe within a very short timeframe as they re-booted after receiving a routine set of patches through Windows Update.

    The high number of restarts affected Skype’s network resources. This caused a flood of log-in requests, which, combined with the lack of peer-to-peer network resources, prompted a chain reaction that had a critical impact.

    Normally Skype’s peer-to-peer network has an inbuilt ability to self-heal, however, this event revealed a previously unseen software bug within the network resource allocation algorithm which prevented the self-healing function from working quickly. Regrettably, as a result of this disruption, Skype was unavailable to the majority of its users for approximately two days.

    YOU WROTE BAD CODE! Do not blame your customers for performing responsible defensive measures (installing updates and rebooting as needed) for your bad code. You screwed up, take your punishment like a man not like a whining adolescent (it’s not MY fault, Microsoft usere rebooted).

    THere will, I suspect, be a penalty in the migration of business users to more reliable and less self defensive VIOP vendors. Badly done.

    Comment by Sino — August 21, 2007 @ 5:08 pm

  6. I wouldn’t trust Skype to my business functions anyway. Most of the Internet users use the free version, and for those who do pay, I don’t think the payment comes with an SLA.

    Anytime you use a technology such as VoIP, you don’t hand over the mission critical data without an SLA, Quality of Service architecture, and redundant systems.

    Skype was first on the market because no one else wanted to, or they weren’t quick enough. For casual conversations, its great. As a business tool, I wouldn’t count on it even before the network failure. There are too many other players that have been in the telecommunication industry that can provide much better service than Skype. Maybe in a few years, and after this network outage, if they want the business, they will find a way to make sure this doesn’t happen again.

    Oh, by the way, 365 Main - a very popular data center in San Francisco - was shut down for 45 minutes because the PG&E power hub had surged and caused a failure in the electricity. The backup generators - diesel powered - failed to turn on due to a detroit deisel controller chip failing to start. It was a timing issue that cause the generators to start and stop. Such a small issue that was missed on regular maintenance checks. This was the first time they have had a problem like that. So what did they do about it? Post the entire details of the outage and also a public statement. See their post at http://tinyurl.com/2j3347. Inside the post they also have the entire incident documented step by step. This is an example of how a company should respond to an outage, especially one that affects a lot of important people - like Ticketmaster! The difference is that 365Main gets paid to peform, whereas Skype users may or may not pay.

    Dino

    Comment by Dino Palladino — August 21, 2007 @ 7:08 pm

  7. Take out the period in the url. sorry

    Comment by Dino Palladino — August 21, 2007 @ 7:09 pm

  8. So how come previous updates didn’t affect Skype before, just about every critical update requires a reboot so there should have been a pattern of outages if this was the cause. The only reason this wouldn’t be the case is if Skype had altered the algorithim used to heal the network availability recently and therefore it’s purely down to Skpe for the outage.

    Comment by Paul ANderson — August 22, 2007 @ 2:22 am

  9. We need to do the math, the Skype network uptime is still over 99.9 percent.

    Comment by ZOverLord — August 22, 2007 @ 4:23 am

  10. I think the statement “After the recent Skype outage I certainly would not use Skype for my business” is a bit harsh. Certainly you have experienced “outages” with regular telephone services with storms, hurricanes, tornadoes, floods.

    I use Skype infrequently and wasn’t aware of the outage last week until I got an email from Skype informing me of such and giving me one free week of usage.

    Maybe they should have sent something out earlier, especially to those users who rely on Skype… surely they know who those heavy users are.

    Just my 2 cents worth!

    Comment by Phil Lundeen — August 22, 2007 @ 4:36 pm

  11. Anyone running a business that doesnt have a backup communication plan is really not planning ahead themselves. I have a few small busineses and have 3 high speed networks available to me. Cable at home, DSL at work and a Sprint Card. We have had zero downtime. Phone lines go off at times, and people use their cell phones.

    That Windows updates was a pain. I spent half the morning trying to convince my daughters laptop her Rhapsody MP3 downloads were not viruses. It apparently did some kind of security update that forced things into a lockdown type mode. It was unlike any other windows update I have seen.

    Comment by Marty — August 22, 2007 @ 6:42 pm

  12. – here’s hoping eBay and Skype also file this incident under “lessons learned.”

    You must not follow eBay’s reputation with it’s customers. They are the ones who have to constantly add space to hold the “lesson learned” files.

    Comment by concho — August 23, 2007 @ 9:19 am

TrackBack URL

Leave a comment