GitHub Downtime

, 2018-12-03

I didn't notice any issues with GitHub, but others did. The majority of my interaction is just through the git protocol, so things tend to work fast, and I don't have any database access. I rarely use the Issues, and other parts of GitHub, which were affected when GitHub had a MySQL cluster fail over. There's a good write up of the post incident analysis that's worth reading, from a database perspective.

I'm not a big MySQL guy, only running an instance to power T-SQL Tuesday. The structure of a write primary and many read replicas that GitHub describes makes sense. It's similar to what I've done in SQL Server, and certainly the idea of some quorum management, handled at GitHub with the Orchestrator software, is something that needs to be configured properly. Allan Hirt has talked about the complexities of quorum in large installations, and it's not a simple thing to configure.

In reading about this, there are a couple things that strike me. First, the analysis talks about a degredation of service because East coast applications had to send writes to West Coast database servers. There were some problems with the way the database servers were working, but it seems to me that there should be some sort of application failover that's possible. If you can't have an application and database fail separately without customer impact, then there should be some way to fail applications over. Perhaps not, but if you're responsible for designing HA for the database, make sure you talk to the application people and test for issues.

The second thing for me is that somehow there was a period of time when writes were occurring to the East Coast system that weren't sent to the West Coast. My ignorance of how this HA stuff works in MySQL prevents me from making a big deal of this, but this isn't something that should happen. If the quorum moves data to another node, it must stop writes to the first node. This could happen in SQL Server, but for me, this is the level of data loss I'd need to accept in my RPO.





Related content


Will the next version of Windows be a "Mini-Me" version of Vista? Who knows, and it's too early to tell, but apparently there's a mini-kernel version of Windows 7, the one after Vista, which fits into 25MB on disk. That's a touch lower than the 4GB that Vista takes up. Granted it's not a full […]


60 reads

An Hour in Time

Daylight Savings time switches a little later this year. In fact it's November 4th this year, after having been in October for all of my life. In case you don't remember which way we move the clocks, here's a saying: Spring forward, fall back.

5 (1)


199 reads

Software is Like Building a House

One of the really classic analogies in software is that it's like building a house. You have a foundation, multiple teams, lots of contractors that specialize in something, etc. And it's an analogy that's debated as to its relevance over and over. I won't go into the correctness of this analogy, but I wanted to comment on it.

2012-10-08 (first published: )

293 reads