We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message
80,259 News Articles

Microsoft: Botched firmware update set off Outlook.com outage

The Outlook.com partial outage lasting 16 hours on Tuesday and Wednesday morning was caused by a firmware update gone awry that triggered a temperature spike in a Microsoftdata center, resulting in automatic safeguards that made a large number of servers inaccessible.

Because of the unspecified safeguards, downed servers couldn't fail over on their own so restoration work had to be done manually, slowing down the process, according to a blog post by Microsoft Outlook.com Vice President Arthur de Haan.

[ BACKGROUND: Microsoft fixes Hotmail, Outlook.com glitches that caused outage

RELATED: Microsoft Hotmail, Outlook, SkyDrive problems could hurt customer confidence

QUIZ: Microsoft CEO Steve Ballmer said what? ]

De Haan apologized for the disruption of email access. "Outages are something we take very seriously and invest a significant amount of our time and energy in doing our best to prevent."

His description of what happened actually happened doesn't detail what software was being updated, what went wrong, what overheated, what safeguards kicked in or how many servers were involved: "On the afternoon of the 12th, in one physical region of one of our datacenters, we performed our regular process of updating the firmware on a core part of our physical plant. This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way. This failure resulted in a rapid and substantial temperature spike in the datacenter. This spike was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the datacenter," de Haan's blog says.

"These safeguards prevented access to mailboxes housed on these servers and also prevented any other pieces of our infrastructure to automatically failover and allow continued access. This area of the datacenter houses parts of the Hotmail.com, Outlook.com, and SkyDrive infrastructure, and so some people trying to access those services were impacted."

There was no way to restore the affected infrastructure without human intervention, which he says "added significant time to the restoration."

Microsoft is working on improvements to prevent the same scenario from playing out in the future. "Now that we're through the resolution, we're also hard at work on ensuring this doesn't happen again," he says.

Tim Greene covers Microsoft for Network World and writes the Mostly Microsoft blog. Reach him at [email protected] and follow him on Twitter @Tim_Greene.

IDG UK Sites

Acer Aspire R11 review: Hands-on with the 360 laptop and tablet convertible

IDG UK Sites

Apple Watch release day: Twitter reacts

IDG UK Sites

See how Framestore created a shape-shifting, oil and metal based creature for Shell

IDG UK Sites

Apple Watch buying guide, price list & where to buy today: Which Apple Watch model, size, material,?......