Blog Post

Moneyball And The DBA


“Moneyball” has been a thing for a while now. It’s shorthand for the application of quantitative data analysis to the various aspects of a sport, from individual player stats to how the team’s members are arrayed on the field. Many – including many current and former players – lament that the heavy emphasis on numbers-crunching has robbed sports of much of its intangibles, and has made game play boring and uninspired.

I’m not here to argue one way or another about the pro’s and con’s of statistical analysis in the sporting world. I bring it up because a recent interview on the subject, specifically on how MLB teams used SA to determine many aspects of the game: defensive positioning, pitch selection, lineups, and so on, caught my eye, and made me think about how Moneyball works for the DBA.

The interview struck a chord with me because, as DBA’s, we rely on the analysis of many such statistics to help us determine how to optimize server and database settings. We collect data, establish benchmarks, compare snapshots, fret over deltas. And there are times, I think, when we can’t see the forest for the trees.

This happened to me recently. I had to refactor a table with a couple of billion rows in it because the IDENTITY column, declared at its inception as integer, was about to run out of values, and we needed to change it to bigint. Since this was going to involve a new table anyway, it seemed like an excellent time to cull old, under-performing indexes, check to see if there were any missing indexes that could be valuable, and change the clustering key, which I already knew was defective. The IDENTITY column had been made the PK, clustered, but the table was a child table with a FK to the parent, and is never queried except in conjunction with the parent. The obvious choice – or perhaps I should say, with greater honesty, the knee-jerk choice – was to make the parent key the lead in a compound key of ParentID/ChildID.

I ran a query to see if there were any indexes with very high writes and very low reads, and found one that was completely unused, and one that had just over two dozen reads in the two months since the server was last restarted (compared to several million writes), and made immediate plans to drop these. I also planned to drop a covering index that had a high number of seeks, but that I believed would be effectively replaced by the new clustered index, which would also make obsolete two other indexes on the parent key.

All went well with the conversion, until the following morning, when several jobs that query this table began bogging down, blocking and deadlocks went through the roof, and all proverbial hell broke loose. Within minutes I’d moved to restore two of the three indexes that I’d dropped, disabled several jobs until the processing could catch up, and begun to try and understand what had gone wrong.

It wasn’t until I’d spent some time talking to the developers that I realized my mistake. I had relied on the index usage statistics to make my decisions, and hadn’t bothered to consider the how’s and why’s of those original indexes.

The low read/high write index was indeed seldom used, but was a covering index for a critical procedure that ran once or twice a week, one that would drone on for hours as it walked all two billion rows and performed some calculations. The high seeks index that I thought would be supplanted by the new clustered index in fact should have been the new clustering index, since it contained every column in the table (a fact that didn’t show up in the usage stats analysis), and was a much better fit (as the optimizer already knew) than the clustering key I’d concocted.

All these things would have come to mind much sooner had I taken the time to talk to one or two of the process owners and developers. Instead, I relied purely on the stats to guide my decisions. Needless to say, that was a valuable (albeit embarrassing) lesson to learn.

Many, if not most, of us do not have the luxury of having built our servers and databases from the ground up – we have inherited them from others, and as the cliche goes, “they are what they are”. Gathering and analyzing statistics is valuable, to be sure, but we can’t ever forget that computers are here to help people, not the other way around. Take some time to learn as much as you can about the business behind the data and the applications. Your people will thank you.