Tech Support EX

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Friday, 9 September 2011

What Happened to Google Docs on Wednesday

Posted on 08:30 by Unknown
Posted by Alan Warren, Engineering Director

(Cross-posted from the Google Docs Blog.)

Not our best week. On Wednesday we had an outage that lasted one hour and meant that document lists, documents, drawings and Apps Scripts were inaccessible for the majority of our users. We use Google Docs ourselves every day, so we feel your pain and are very sorry.

So what happened? The outage was caused by a change designed to improve real time collaboration within the document list. Unfortunately this change exposed a memory management bug which was only evident under heavy usage.

Every time a Google Doc is modified, a machine looks up the servers that need to be updated. Due to the memory management bug, the lookup machines didn’t recycle their memory properly after each lookup, causing them to eventually run out of memory and restart. While they restarted, their load was picked up by the remaining lookup machines - making them run out of memory even faster. This meant that eventually the servers couldn’t properly process a large fraction of the requests to access document lists, documents, drawings, and scripts which led to the outage you saw on Wednesday.

Our automated monitoring noticed that attempts to access documents were failing at an increased rate, and alerted us 60 seconds later after the failure rate increased sharply. The engineering teams diagnosed the problem, determined that it was correlated with the feature change, and started rolling it back 23 minutes after the first alert. In parallel, we doubled the capacity of the lookup service to mitigate the impact of the memory management bug. The rollback completed 24 minutes later, and 5 minutes after that the outage was effectively over as the additional capacity restored normal function.

Since resolution, we have been assembling and scrutinizing the timeline of this event, and have assembled a list of steps which will both reduce the chance of a future event, decrease the time required to notice and resolve a problem, and limit the scope which any single problem can affect. We intend to take all these steps; some are not easy, but we're committed to keeping Google's services exceptionally reliable. In the meantime, rest assured that we take every outage very very seriously, and as always we'll post a full incident report of what happened to the Apps Dashboard once our investigation is complete. Again, we apologize for the inconvenience and frustration which the outage has caused.
Email ThisBlogThis!Share to XShare to Facebook
Posted in google docs | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • Mapping a clear strategy for business growth with Google
    Posted by A.V. Dharmakrishnan, Chief Executive Officer, Madras Cements Editor's note: Today’s guest is A.V. Dharmakrishnan, Chief Exec...
  • Announcing Google Earth Pro version 6.2: A more beautiful Earth and Parcel search
    Posted by Dan Cohen, Google Earth Pro Team We would like to share the new features available in the latest release of Google Earth Pro. Vers...
  • Keeping you safe in 2013
    Posted by Eran Feigenbaum, Director of Security, Google Enterprise Most businesses these days rely on technology to get their work done. An...
  • Building a better map of Europe
    Posted by Brian McClendon, Vice President, Google Maps and Google Earth Whether your business is building a solution to map gas pipelines or...
  • A year in review: work the way you live
    Posted by Amit Singh, President, Google Enterprise Ten years ago, business technology was at the forefront of innovation and productivity. B...
  • Cloud computing enabling entrepreneurship in Africa
    Posted by Zafir Khan, Google App Engine Team (Cross-posted on the Official Google Blog .) In 2007, 33-year-old Vuyile moved to Cape Town f...
  • A financial perspective on moving to the cloud
    Posted by Alandha Scott, Google Apps Team Editors note: Chief Financial Officers are key decision-makers in any organization’s move to th...
  • Sperry Van Ness goes Google
    Posted by Kevin Maggiacomo, CEO of Sperry Van Ness Editors note: Today’s guest blogger is Kevin Maggiacomo, CEO of Sperry Van Ness, one of ...
  • Calendar events that update when Google Groups change
    Posted by Boris Khvostichenko, Product Manager, Google Calendar Life is full of changes, especially at work. As some people retire or move ...
  • Making it easier to bring Hangouts to work
    Posted by: Ronald Ho, Product Manager, Google Apps for Business Whether your organization has two people or 200,000, it should be easy to co...

Categories

  • #gone google
  • #gonegoogle
  • #Google Apps
  • #innovationupgrade
  • #moregoogleapps
  • #SysAdminDay
  • #tbt
  • #throwbackthursday
  • #top10trust
  • 100% web
  • admin
  • admin sdk
  • Android
  • Apps
  • Apps Adventures
  • Asia Pacific
  • Audi
  • Audi Connect
  • Australia
  • big data
  • Big Query
  • bigquery
  • Boston
  • browser
  • Chrome
  • Chrome for Business
  • Chrome Frame
  • Chrome OS
  • chromebooks
  • Chromebooks for Education
  • chromebox
  • City 24/7
  • Clearing Kosovo
  • Cloud
  • cloud computing
  • cloud computing gonegoogle
  • cloud computing gonegoogle Google Apps
  • cloud computing gonegoogle Google Apps google docs small business success story
  • cloud computing gonegoogle Google Apps google docs small business success story switch
  • cloud datastore
  • cloud platform
  • cloud print
  • cloud services
  • cloud sql
  • collaboration
  • Colorado
  • contacts
  • customer love
  • Customer story
  • Customer testimonial
  • Developer
  • developers
  • Docs
  • documents
  • drive storage
  • Earth
  • earth and maps
  • education
  • enterprise
  • events
  • FedEx
  • Fedex.com
  • franchises
  • Gartner
  • GE
  • Global Partner Summit
  • gmail
  • Gone Google
  • gonegoogle
  • Google App Engine
  • Google Apps
  • Google Apps Blog
  • Google Apps Engine
  • Google Apps for Business
  • google apps for education
  • Google Apps for Government
  • Google Apps Reseller
  • Google Apps Script
  • Google Apps Vault
  • Google Calendar
  • Google Cloud Platform
  • google cloud storage
  • google commerce search
  • Google Compute Engine
  • google docs
  • google drive
  • Google Earth
  • Google Earth Enterprise
  • Google Earth Pro
  • Google Enterprise
  • Google Enterprise Search
  • Google Forms
  • Google Green
  • google groups
  • Google Maps
  • Google Maps API
  • Google Maps Coordinate
  • Google Maps Engine
  • Google Maps Engine public data program
  • Google Maps for Business
  • Google Maps Tracks API
  • Google Places API
  • google play for education
  • Google Prediction API
  • Google Search Appliance
  • google sites
  • Google spreadsheets
  • google storage
  • Google Storage for Developers
  • google+
  • Google+ api
  • Google+ Communities
  • googlenew
  • government
  • GSA
  • GSA 7.0
  • guest post
  • HALO Trust
  • Hangout on Air
  • hangouts
  • innovation
  • international trade
  • Internet Explorer
  • intranet
  • iOS
  • iPad
  • IT
  • K-12
  • large business
  • manufacturing
  • Maps
  • marketplace
  • medium business
  • mobile
  • moms
  • Mother's Day
  • NAVMAN
  • new features
  • Niagara International Transportation Technology Coalition
  • non-profit
  • noteworthy
  • offline
  • partner
  • partners
  • Place Summaries
  • Postini
  • productivity
  • Quickoffice
  • Receptionist's Day
  • retail
  • SBW2013
  • Search
  • Security
  • Sheets
  • Slides
  • small business
  • SMB
  • success story
  • support
  • System Admin
  • T Dispatch
  • Transport and Logistics
  • Trust
  • university
  • utilities
  • Veteran Owned Businesses
  • Veterans Day
  • wallet
  • webinar

Blog Archive

  • ►  2013 (239)
    • ►  December (18)
    • ►  November (25)
    • ►  October (29)
    • ►  September (19)
    • ►  August (11)
    • ►  July (21)
    • ►  June (26)
    • ►  May (24)
    • ►  April (20)
    • ►  March (13)
    • ►  February (17)
    • ►  January (16)
  • ►  2012 (176)
    • ►  December (18)
    • ►  November (17)
    • ►  October (21)
    • ►  September (15)
    • ►  August (13)
    • ►  July (11)
    • ►  June (19)
    • ►  May (16)
    • ►  April (13)
    • ►  March (9)
    • ►  February (12)
    • ►  January (12)
  • ▼  2011 (85)
    • ►  December (14)
    • ►  November (19)
    • ►  October (17)
    • ▼  September (21)
      • Helping larger businesses make the most of Google’...
      • Google Apps helps Philz Coffee focus on brewing th...
      • Google Apps data protections - verified by third p...
      • The City of Mesquite has gone Google
      • Google Apps helps clients and attorneys collaborat...
      • New apps status dashboard improves visibility
      • Disaster recovery - built right in to Google Apps
      • Our commitment to the Safe Harbor privacy framework
      • Announcing Google Earth Pro 6.1: New Features and ...
      • Strong authentication to protect business user acc...
      • Live webinar: Accessibility Updates for Docs, Site...
      • Live webinar: Chromebook innovation
      • Tradition meets technology: top universities using...
      • Supporting Europe’s Efforts for More Cloud Adoption
      • Comment-only access in Google documents
      • What Happened to Google Docs on Wednesday
      • A different approach to patch management
      • Gmail: It’s cooler in the cloud
      • Pure and proven cloud architecture
      • The evolution of enterprise software
      • Amirsys’ STATdx® diagnostic support portal + Googl...
    • ►  August (14)
Powered by Blogger.

About Me

Unknown
View my complete profile