We use cookies to provide you with a better experience. If you continue to use this site, we'll assume you're happy with this. Alternatively, click here to find out how to manage these cookies

hide cookie message
80,259 News Articles

Overloaded servers knock Gmail offline

Google acts to prevent repeat performance

A worldwide outage of Google's Gmail webmail service yesterday was caused by a traffic jam on its servers, according to Google's official Gmail blog.

The problem was that some recent changes designed to improve traffic flow on request routers, servers designed to direct web queries to the appropriate Gmail server, overloaded the system after workers took some Gmail servers offline to perform routine upgrades.

"As we now know, we had slightly underestimated the load which some recent changes placed on the request routers," Ben Treynor, site reliability Czar wrote on the Gmail blog. "At about 12:30pm Pacific a few of the request routers became overloaded and in effect told the rest of the system 'stop sending us traffic, we're too slow!'. This transferred the load onto the remaining request routers, causing a few more of them also to become overloaded, and within minutes nearly all of the request routers were overloaded."

The overload resulted in people around the world being unable to access Gmail for about 100 minutes, Treynor said, though he noted that IMAP/POP access and mail processing continued to work normally.

Gmail engineers were alerted to the problem within seconds of the failures and after figuring out what the problem was, brought additional request routers online. Now, Gmail is more than 99.9 percent available to users, he said.

"We've turned our full attention to helping ensure this kind of event doesn't happen again," he wrote.

One fix the company plans to make is to ensure request routers will work better by having them slow down when overloaded instead of refusing to accept traffic. Treynor said the request routers need to have sufficient failure isolation so that a problem in one data center doesn't affect servers in another data center.

The company will work over the next few weeks to make these changes and further improve reliability, he said.


IDG UK Sites

Best January sales 2015 UK tech deals LIVE: Best New Year bargains and savings on phones, tablets,...

IDG UK Sites

Chromebooks: ready for the prime time (but not for everybody)

IDG UK Sites

Hands-on with Sony's latest smartglasses

IDG UK Sites

Apple TV expert tips: get US Apple TV content, watch Google Play, use multiple Apple IDs and more