We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message
80,259 News Articles

Microsoft: Botched firmware update set off Outlook.com outage

The Outlook.com partial outage lasting 16 hours on Tuesday and Wednesday morning was caused by a firmware update gone awry that triggered a temperature spike in a Microsoftdata center, resulting in automatic safeguards that made a large number of servers inaccessible.

Because of the unspecified safeguards, downed servers couldn't fail over on their own so restoration work had to be done manually, slowing down the process, according to a blog post by Microsoft Outlook.com Vice President Arthur de Haan.

[ BACKGROUND: Microsoft fixes Hotmail, Outlook.com glitches that caused outage

RELATED: Microsoft Hotmail, Outlook, SkyDrive problems could hurt customer confidence

QUIZ: Microsoft CEO Steve Ballmer said what? ]

De Haan apologized for the disruption of email access. "Outages are something we take very seriously and invest a significant amount of our time and energy in doing our best to prevent."

His description of what happened actually happened doesn't detail what software was being updated, what went wrong, what overheated, what safeguards kicked in or how many servers were involved: "On the afternoon of the 12th, in one physical region of one of our datacenters, we performed our regular process of updating the firmware on a core part of our physical plant. This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way. This failure resulted in a rapid and substantial temperature spike in the datacenter. This spike was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the datacenter," de Haan's blog says.

"These safeguards prevented access to mailboxes housed on these servers and also prevented any other pieces of our infrastructure to automatically failover and allow continued access. This area of the datacenter houses parts of the Hotmail.com, Outlook.com, and SkyDrive infrastructure, and so some people trying to access those services were impacted."

There was no way to restore the affected infrastructure without human intervention, which he says "added significant time to the restoration."

Microsoft is working on improvements to prevent the same scenario from playing out in the future. "Now that we're through the resolution, we're also hard at work on ensuring this doesn't happen again," he says.

Tim Greene covers Microsoft for Network World and writes the Mostly Microsoft blog. Reach him at [email protected] and follow him on Twitter @Tim_Greene.

IDG UK Sites

LG G4 Note UK release date and specification rumours: Samsung Galaxy Note 5 killer could be the LG 3......

IDG UK Sites

In defence of BlackBerrys

IDG UK Sites

Why we should reserve judgement on Apple ditching Helvetica in OS X/iOS for the Apple Watch's San...

IDG UK Sites

Retina 3.3GHz iMac 27in preview: Apple cuts £400 off Retina iMac with new model