The obesity epidemic is much in the news these days. The list of health issues related to being overweight is long, and admonitions to watch what we eat and to exercise regularly and stay active abound.
But it’s not just people that can be affected adversely by being overweight. Databases can suffer as well. Clogged up with inactive data, burdened by over-indexing, gorging on disk space, crippled by poor architecture, they are at-risk for many ailments, from mere poor performance to outright collapse.
Over-indexing is a problem I see quite often: multiple indexes with the same lead key, multiple covering indexes, indexes on columns with poor selectivity, indexes on columns that seldom appear in predicates, and so on. Indexes aren’t cheap: they cost time and money to maintain, they slow down write operations, they can lead to frequent deadlocks. There are plenty of scripts out there to show index usage statistics: you should be running these periodically, and marking those indexes with few or no seeks for possible elimination. Be chary, though: a high number of user scans doesn’t necessarily mean an inefficient index: there are full scans and range scans, and the stats don’t distinguish between the two. But the stats can indicate which indexes need to be watched. Querying for the key columns and includes is useful also: multiple indexes with the same lead in the key are good candidates for consolidation, even if it means a large number of includes.
The other major contributor to database obesity is too much inactive data. Imagine if, every time you went to the grocery, you brought with you not just the current grocery list, but every grocery list you had ever written. That’s a lot of grocery lists, and more to the point, a lot of dead weight. The same is true for data that is no longer being actively processed. Certainly, you will retain data for various reasons: client obligations, regulatory requirements, and the like: but data that isn’t in active play can be moved to archives, still accessible and reportable, but no longer swelling the size of indexes and skewing statistics. And at some point, when retention requirements have been satisfied, and the data has outlived its usefulness, it ought to be purged outright.
Both these problems can be exacerbated by poor design. Improper clustering on subordinate tables is the main culprit here. Lookups are expensive: if the subordinate table is queried primarily in a join with the primary table, then it ought to be ordered on the key of the primary table, in conjunction with its own non-clustered key. Not following this principle leads to over-indexing, as developers try to work around the bottleneck and find a way to improve performance.
If your database is overweight and out-of-shape, be prepared to embrace the “no pain, no gain” mantra. Cut down on unneeded indexes, archive or purge unnecessary data, refactor architecture to promote healthy joins. It won’t be easy: losing weight seldom is: but in the long term you’ll have a healthy, vital database that will scale well and keep running smoothly for years to come.