Updates During Outages

  • Comments posted to this topic are about the item Updates During Outages

  • My team had to engineer a system to be highly resilient to failure and secure in the cloud.

    The separation required made the solution hugely more expensive due to the number of machines required. I found myself asking myself who is paying for this? Do they need this level of resilience? What would be impacted if various facilities were unavailable?

    If a system goes down then people will moan. It's the daylight equivalent of snoring due to disturbed sleep. It's identifying the systems that are absolutely core and architecting those in a swa to eliminate downtime as per Netflix.

    The rest of them I agree that a communication process with appropriate infrastructure is the way to go. It's certainly cheaper.

    My team were also looking at setting up a Docker swarm to reduce the cost of having extra machines where a container would do

  • We use a text messaging service which is totally disconnected from our IT infrastructure to communicate with our organization during different types of outages.

  • Making sure that DNS is resilient is also important so it does not become your single point of failure (SPoF), if you host you own DNS ensure it is hosted on 2 geographically dispersed locations or use a resilient DNS provider (e.g. Dyn). If your DNS is down you may not even be able to send an email notification to you customers!

    As the post infers, make sure your status site e.g. "status.domainname.com" is hosted on different infrastructure (even using an alternative ISP), this will ensure it is always available.

    While working at my previous employer I hosted the companies status site on their corporate infrastructure (as it was a low traffic site), keeping the clients sites hosted in a Tier1 data centre, DNS was with Dyn so we had eliminated all the important SPoFs. This meant that communicating with clients during an outage was not (generally) an issue.

  • It also isn't helpful if there is just a status update page without push notifications of outages.

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply