Date: November 15, 2016
An update to display the outage notification for our mainteancen period caused the web.config file to be left open with a write lock and became inaccessible to the IIS processes, causing the GRC Cloud production sites to be inaccessible.
19:32 UTC - System unavailable - under investigation.
20:06 UTC - System online and monitoring.
20:13 UTC - System unavailable
20:24 UTC - System online and stable.
Resolver's DevOps team set up a new folder and re-directed the sites to the new folder. This temporarily restored services but caused a secondary outage when the application re-compiled shared .NET libraries and caused the server to run out of disk space. Disk space was freed and all services were restored.
1. Our team is investigating the cause of the initial file locking issue. We have re-produced this in a test site and are now reviewing how to address this from happening again.
2. Update process to ensure all updates to the codebase web.config are only done after business hours.
3. Reduced disk space monitor threshold to 80%, down from 90%, to receive earlier warning about disk space.
4. Update Site24x7 disk space monitor to give capacity alerts on a per-drive basis rather than by total space.