Tech Support EX

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 9 September 2011

What Happened to Google Docs on Wednesday

Posted on 08:30 by Unknown
Posted by Alan Warren, Engineering Director

(Cross-posted from the Google Docs Blog.)

Not our best week. On Wednesday we had an outage that lasted one hour and meant that document lists, documents, drawings and Apps Scripts were inaccessible for the majority of our users. We use Google Docs ourselves every day, so we feel your pain and are very sorry.

So what happened? The outage was caused by a change designed to improve real time collaboration within the document list. Unfortunately this change exposed a memory management bug which was only evident under heavy usage.

Every time a Google Doc is modified, a machine looks up the servers that need to be updated. Due to the memory management bug, the lookup machines didn’t recycle their memory properly after each lookup, causing them to eventually run out of memory and restart. While they restarted, their load was picked up by the remaining lookup machines - making them run out of memory even faster. This meant that eventually the servers couldn’t properly process a large fraction of the requests to access document lists, documents, drawings, and scripts which led to the outage you saw on Wednesday.

Our automated monitoring noticed that attempts to access documents were failing at an increased rate, and alerted us 60 seconds later after the failure rate increased sharply. The engineering teams diagnosed the problem, determined that it was correlated with the feature change, and started rolling it back 23 minutes after the first alert. In parallel, we doubled the capacity of the lookup service to mitigate the impact of the memory management bug. The rollback completed 24 minutes later, and 5 minutes after that the outage was effectively over as the additional capacity restored normal function.

Since resolution, we have been assembling and scrutinizing the timeline of this event, and have assembled a list of steps which will both reduce the chance of a future event, decrease the time required to notice and resolve a problem, and limit the scope which any single problem can affect. We intend to take all these steps; some are not easy, but we're committed to keeping Google's services exceptionally reliable. In the meantime, rest assured that we take every outage very very seriously, and as always we'll post a full incident report of what happened to the Apps Dashboard once our investigation is complete. Again, we apologize for the inconvenience and frustration which the outage has caused.
Email ThisBlogThis!Share to XShare to Facebook
Posted in google docs | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Mapping a clear strategy for business growth with Google
    Posted by A.V. Dharmakrishnan, Chief Executive Officer, Madras Cements Editor's note: Today’s guest is A.V. Dharmakrishnan, Chief Exec...
  • Keeping you safe in 2013
    Posted by Eran Feigenbaum, Director of Security, Google Enterprise Most businesses these days rely on technology to get their work done. An...
  • Hangout On Air: Lessons from a retail CIO on moving to the cloud with ease
    Posted by Meghdutt Brahmachari, Product Marketing Manager, Google Enterprise Change is never easy, but when a business has tens of thousand...
  • The Gateway to the West is moving into the cloud
    Posted by Francis G. Slay, Mayor of St. Louis, Missouri Editors note: Today, we welcome Mayor Francis G. Slay of the City of St. Louis , t...
  • Smart school: Cartoon program uses Google Apps
    Editors note: Today’s guest blogger is Michele Ollie, co-founder and President of The Center for Cartoon Studies , based in White River Jun...
  • Sperry Van Ness goes Google
    Posted by Kevin Maggiacomo, CEO of Sperry Van Ness Editors note: Today’s guest blogger is Kevin Maggiacomo, CEO of Sperry Van Ness, one of ...
  • Perry Ellis International brings their global team together with Google Apps
    Posted by Ronen Lapidot, Senior Vice President of Information Technology, Perry Ellis International Editors note: Today’s guest blogger is ...
  • Google Chromebooks provide personalization, collaboration for mSchool
    Posted by Elliot Sanchez, Founder and Chief Executive Officer, mSchool Editor's note: Today’s guest blogger is Elliot Sanchez, founder...
  • Colorado is the newest state to go Google
    Posted by Scott McIntyre, Director of State and Local Government, Google What does the State of Colorado have in common with neighboring sta...
  • Google Apps Improves Efficiency for Redfin Agents and Engineers
    Editor's note: Today's guest blogger is Eric Hollenbeck, Sr. Manager of IT & Business Services at Redfin , a technology-powered...

Categories

  • #gone google
  • #gonegoogle
  • #Google Apps
  • #innovationupgrade
  • #moregoogleapps
  • #SysAdminDay
  • #tbt
  • #throwbackthursday
  • #top10trust
  • 100% web
  • admin
  • admin sdk
  • Android
  • Apps
  • Apps Adventures
  • Asia Pacific
  • Audi
  • Audi Connect
  • Australia
  • big data
  • Big Query
  • bigquery
  • Boston
  • browser
  • Chrome
  • Chrome for Business
  • Chrome Frame
  • Chrome OS
  • chromebooks
  • Chromebooks for Education
  • chromebox
  • City 24/7
  • Clearing Kosovo
  • Cloud
  • cloud computing
  • cloud computing gonegoogle
  • cloud computing gonegoogle Google Apps
  • cloud computing gonegoogle Google Apps google docs small business success story
  • cloud computing gonegoogle Google Apps google docs small business success story switch
  • cloud datastore
  • cloud platform
  • cloud print
  • cloud services
  • cloud sql
  • collaboration
  • Colorado
  • contacts
  • customer love
  • Customer story
  • Customer testimonial
  • Developer
  • developers
  • Docs
  • documents
  • drive storage
  • Earth
  • earth and maps
  • education
  • enterprise
  • events
  • FedEx
  • Fedex.com
  • franchises
  • Gartner
  • GE
  • Global Partner Summit
  • gmail
  • Gone Google
  • gonegoogle
  • Google App Engine
  • Google Apps
  • Google Apps Blog
  • Google Apps Engine
  • Google Apps for Business
  • google apps for education
  • Google Apps for Government
  • Google Apps Reseller
  • Google Apps Script
  • Google Apps Vault
  • Google Calendar
  • Google Cloud Platform
  • google cloud storage
  • google commerce search
  • Google Compute Engine
  • google docs
  • google drive
  • Google Earth
  • Google Earth Enterprise
  • Google Earth Pro
  • Google Enterprise
  • Google Enterprise Search
  • Google Forms
  • Google Green
  • google groups
  • Google Maps
  • Google Maps API
  • Google Maps Coordinate
  • Google Maps Engine
  • Google Maps Engine public data program
  • Google Maps for Business
  • Google Maps Tracks API
  • Google Places API
  • google play for education
  • Google Prediction API
  • Google Search Appliance
  • google sites
  • Google spreadsheets
  • google storage
  • Google Storage for Developers
  • google+
  • Google+ api
  • Google+ Communities
  • googlenew
  • government
  • GSA
  • GSA 7.0
  • guest post
  • HALO Trust
  • Hangout on Air
  • hangouts
  • innovation
  • international trade
  • Internet Explorer
  • intranet
  • iOS
  • iPad
  • IT
  • K-12
  • large business
  • manufacturing
  • Maps
  • marketplace
  • medium business
  • mobile
  • moms
  • Mother's Day
  • NAVMAN
  • new features
  • Niagara International Transportation Technology Coalition
  • non-profit
  • noteworthy
  • offline
  • partner
  • partners
  • Place Summaries
  • Postini
  • productivity
  • Quickoffice
  • Receptionist's Day
  • retail
  • SBW2013
  • Search
  • Security
  • Sheets
  • Slides
  • small business
  • SMB
  • success story
  • support
  • System Admin
  • T Dispatch
  • Transport and Logistics
  • Trust
  • university
  • utilities
  • Veteran Owned Businesses
  • Veterans Day
  • wallet
  • webinar

Blog Archive

  • ►  2013 (239)
    • ►  December (18)
    • ►  November (25)
    • ►  October (29)
    • ►  September (19)
    • ►  August (11)
    • ►  July (21)
    • ►  June (26)
    • ►  May (24)
    • ►  April (20)
    • ►  March (13)
    • ►  February (17)
    • ►  January (16)
  • ►  2012 (176)
    • ►  December (18)
    • ►  November (17)
    • ►  October (21)
    • ►  September (15)
    • ►  August (13)
    • ►  July (11)
    • ►  June (19)
    • ►  May (16)
    • ►  April (13)
    • ►  March (9)
    • ►  February (12)
    • ►  January (12)
  • ▼  2011 (85)
    • ►  December (14)
    • ►  November (19)
    • ►  October (17)
    • ▼  September (21)
      • Helping larger businesses make the most of Google’...
      • Google Apps helps Philz Coffee focus on brewing th...
      • Google Apps data protections - verified by third p...
      • The City of Mesquite has gone Google
      • Google Apps helps clients and attorneys collaborat...
      • New apps status dashboard improves visibility
      • Disaster recovery - built right in to Google Apps
      • Our commitment to the Safe Harbor privacy framework
      • Announcing Google Earth Pro 6.1: New Features and ...
      • Strong authentication to protect business user acc...
      • Live webinar: Accessibility Updates for Docs, Site...
      • Live webinar: Chromebook innovation
      • Tradition meets technology: top universities using...
      • Supporting Europe’s Efforts for More Cloud Adoption
      • Comment-only access in Google documents
      • What Happened to Google Docs on Wednesday
      • A different approach to patch management
      • Gmail: It’s cooler in the cloud
      • Pure and proven cloud architecture
      • The evolution of enterprise software
      • Amirsys’ STATdx® diagnostic support portal + Googl...
    • ►  August (14)
Powered by Blogger.

About Me

Unknown
View my complete profile