Moneyball And The DBA

Roland-Alexander-STL, 2018-09-05 (first published: 2018-08-17)

“Moneyball” has been a thing for a while now. It’s shorthand for the application of quantitative data analysis to the various aspects of a sport, from individual player stats to how the team’s members are arrayed on the field. Many – including many current and former players – lament that the heavy emphasis on numbers-crunching has robbed sports of much of its intangibles, and has made game play boring and uninspired.

I’m not here to argue one way or another about the pro’s and con’s of statistical analysis in the sporting world. I bring it up because a recent interview on the subject, specifically on how MLB teams used SA to determine many aspects of the game: defensive positioning, pitch selection, lineups, and so on, caught my eye, and made me think about how Moneyball works for the DBA.

The interview struck a chord with me because, as DBA’s, we rely on the analysis of many such statistics to help us determine how to optimize server and database settings. We collect data, establish benchmarks, compare snapshots, fret over deltas. And there are times, I think, when we can’t see the forest for the trees.

This happened to me recently. I had to refactor a table with a couple of billion rows in it because the IDENTITY column, declared at its inception as integer, was about to run out of values, and we needed to change it to bigint. Since this was going to involve a new table anyway, it seemed like an excellent time to cull old, under-performing indexes, check to see if there were any missing indexes that could be valuable, and change the clustering key, which I already knew was defective. The IDENTITY column had been made the PK, clustered, but the table was a child table with a FK to the parent, and is never queried except in conjunction with the parent. The obvious choice – or perhaps I should say, with greater honesty, the knee-jerk choice – was to make the parent key the lead in a compound key of ParentID/ChildID.

I ran a query to see if there were any indexes with very high writes and very low reads, and found one that was completely unused, and one that had just over two dozen reads in the two months since the server was last restarted (compared to several million writes), and made immediate plans to drop these. I also planned to drop a covering index that had a high number of seeks, but that I believed would be effectively replaced by the new clustered index, which would also make obsolete two other indexes on the parent key.

All went well with the conversion, until the following morning, when several jobs that query this table began bogging down, blocking and deadlocks went through the roof, and all proverbial hell broke loose. Within minutes I’d moved to restore two of the three indexes that I’d dropped, disabled several jobs until the processing could catch up, and begun to try and understand what had gone wrong.

It wasn’t until I’d spent some time talking to the developers that I realized my mistake. I had relied on the index usage statistics to make my decisions, and hadn’t bothered to consider the how’s and why’s of those original indexes.

The low read/high write index was indeed seldom used, but was a covering index for a critical procedure that ran once or twice a week, one that would drone on for hours as it walked all two billion rows and performed some calculations. The high seeks index that I thought would be supplanted by the new clustered index in fact should have been the new clustering index, since it contained every column in the table (a fact that didn’t show up in the usage stats analysis), and was a much better fit (as the optimizer already knew) than the clustering key I’d concocted.

All these things would have come to mind much sooner had I taken the time to talk to one or two of the process owners and developers. Instead, I relied purely on the stats to guide my decisions. Needless to say, that was a valuable (albeit embarrassing) lesson to learn.

Many, if not most, of us do not have the luxury of having built our servers and databases from the ground up – we have inherited them from others, and as the cliche goes, “they are what they are”. Gathering and analyzing statistics is valuable, to be sure, but we can’t ever forget that computers are here to help people, not the other way around. Take some time to learn as much as you can about the business behind the data and the applications. Your people will thank you.

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-18

1,631 reads

Networking - Part 4

by Andy Warren

SQLServerCentral.com

Blogs

You may want to read Part 1 , Part 2 , and Part 3 before continuing. This time around I'd like to talk about social networking. We'll start with social networking. Facebook, MySpace, and Twitter are all good examples of using technology to let...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-17

1,530 reads

Speaking at Community Events - More Thoughts

by Andy Warren

SQLServerCentral.com

Blogs

Last week I posted Speaking at Community Events - Time to Raise the Bar?, a first cut at talking about to what degree we should require experience for speakers at events like SQLSaturday as well as when it might be appropriate to add additional focus/limitations on the presentations that are accepted. I've got a few more thoughts on the topic this week, and I look forward to your comments.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

You rated this post out of 5. Change rating

2009-02-13

360 reads

Moneyball And The DBA

Rate

Share

Share

Rate

Moneyball And The DBA

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts