Blog Post

The SQLServerCentral Outage

,

A lot of people have asked what happened over the Memorial Day weekend as SQLServerCentral was down. All I can say is Twitter is cool. It didn't save the day, but it helped me.

When I got up Monday morning, Memorial Day, things on the site were slow, but I didn't think much of it. It's a US holiday, and since most of our traffic comes from there, it's not surprising that things were slow. I had planned on taking most of the day off, and would have logged in briefly in the evening to set the newsletter for Monday. Actually I would have done it Monday morning, but chores called.

So I ran a few errands, and was replacing sprinkler heads in my yard when I stopped for water. I glanced through the tweets on my phone and saw a post that SSC was down. A few minutes later I got a direct message that said SSC was down.

So I washed up and went inside. As soon as I hit the home page, I knew what was wrong. That was the voice of experience, something that had blown up in our faces early in the life of SQLServerCentral. Our account at ZoneEdit had run out of credits and needed to be replenished.

The Short Story

The short version is that I purchased more credts, but ZoneEdit did not immediately turn on SQLServerCentral.com. This is despite the fact that two other domains on that same account were still working. ZoneEdit has been very reliable over the years, is simple, albeit clunky, and usually has worked well. However it was not until 12:30am MST on Tuesday that they acknowledged my payment and turned the domain on. This was a good 8 or 9 hours after I'd paid them.

Also due to the fact that they didn't just stop serving DNS requests, but had actually redirected us to a hosting company's holding page (something they didn't used to do), many people continued to see the site as down until this evening. I have been fielding notes all day and letting people know they need to CTRL+F5 to reload the page from the server.

If you don't want details, you can stop here or skip down to the "What we have done" section.

The Long Story

When I sold the site, along with Brian and Andy, we only sold two domains, not our entire company. Actually I didn't sell Brian and Andy, in case you're wondering, the reverse is actually true. They sold me to Red Gate for a year of indentured servitude, which worked out quite well for all involved.

So we sold the domains, and  in the process of doing so, transferred the registration to Red Gate. Due to the complexities involved, we continued the hosting for the site, email, and DNS under our existing accounts. Over the next 6 months, we slowly migrated the email and then the hosting, and (I thought) the DNS hosting to Red Gate's accounts. Apparently I was mistaken.

DNS is one of those funny beasts that you don't think much about. You set up a series of hosts and IPs, and hopefully link those to your web server software somehow and all of sudden you have a web site. I have used ZoneEdit for my various companies and personal sites and it's always worked well. It's inexpensive, with a "credit" costing US$11 and amounting to a lot of DNS requests. We've typically used a couple of credits a year, and when I checked out last invoice, we'd purchased 10 credits in 2005, which lasted until, well, Monday.

Years ago when this happened because the site traffic was growing quickly, it was during the week. I paid ZoneEdit and within an hour things were working. That didn't happen this time.

For whatever reason, ZoneEdit refused to credit our account and get things back up. I can only speculate, but I am guessing that they have a deal with a hosting company that pays them well to redirect domains to that hosting company (Who I shall not name). I would hope the deal does not include any "slowness" to respond to payments, but given the way many people run their businesses, I would not be surprised if there were less than ASAP service being given to restore domains.

I contacted people at Red Gate, and our IT manager, who is new to the company last year and was on holiday Monday (sorry, Gareth), responded. He checked things, realized there wasn't anything he could do immediately, and set about arranging for changes on Tuesday. A change away from ZoneEdit as the responsible nameservers would take "up to 48 hours" to propagate around the Internet. In reality it's usually 24 hours, but they can't guarantee anything else.

I gave up trying to fix things and contact ZoneEdit (3 tickets opened between 3pm MST and 11pm MST), and went to bed around 11:30pm MST. I had a note on Tues, 12:30AM that service had been restored and my tickets were responded to at various times during Monday morning.

What We Have Done

So what have we done to prevent this from being an issue?

Well the first thing was to move our authoritative DNS away fom ZoneEdit to Rackspace, where all the other Red Gate servers and services are located. As mentioned previously, that will take about 48 hours to be noted by all root servers around the world. Once that is complete, every DNS request for sqlservercentral.com will route to servers at Rackspace.

We have also cleaned our zones a little, removing some old entries, which don't matter, but it's just good housekeeping.

There is documentation is place as well, so now the IT staff at Red Gate knows how this is set up. The old IT manager never migrated things from ZoneEdit, probably because it was a low priority item and it was working. I know we talked about it, but I didn't realize it wasn't working, we had old contact information on our ZoneEdit account (from 2005), and honestly I probably wouldn't have moved it, either.

So, if it happens again, it's not my fault.

We missed two newsletters (sorry). Monday's miss was my fault, a lapse in attention from the holiday weekend. Tuesdays was a joint IT mistake here, that should not likely get repeated. I have moved some content around to republish it, so if you see a few things twice on the home page, that's the reason.

Any questions? Please feel free to comment below.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating