Fragmentation Fear

It's always bothered me slightly that the main rationale behind the design of clustered indexes in SQL Server seems to be to proactively prevent…fragmentation, rather than organize the data based on the needs of the most critical queries, by storing together those values that we are likely to query together. It was refreshing to see Brent Ozar's recent take on the index fragmentation issue, and his advice to "stop worrying" so much about it, and instead try to tackle the real underlying problems.

As Brent points out, index defragmentation is easy to do. We can configure SQL Server Maintenance Plans to rebuild all our indexes, on a schedule, regardless of whether they need it or not. Unfortunately, most of the rebuilds will be a waste of time, if not actively harmful. Ola Hallengren's scripts, or any script that interrogates the DMVs to check the fragmentation level, before rebuilding, offers a much better option. However, if the rebuild threshold is some arbitrary value across all indexes, then you'll still do a lot of needless rebuilding.

Index fragmentation can, of course, cause performance problems, but is actually not a huge issue for many indexes. Logical fragmentation (index pages in the wrong order) thwarts SQL Server's read-ahead mechanism and makes it less IO-efficient at reading contiguous pages on disk. However, this only really affects large range scans from disk, with the emphasis on large and disk. Reduced page density (many gaps causes by page splits and deletes) is a knottier form of fragmentation. If pages are "half full", they will take up twice the space on disk, and in memory, and twice the IO bandwidth to transfer the data. Again, though, this won't affect infrequently modified indexes.

If our data is in memory anyway, as Brent suggests, or we're able to take hard look at our queries and reduce the number of large range scans, then we can stop worrying so much about fragmentation for those indexes. Once we do, maybe we can return to considering index design primarily in terms of establishing the "natural order" of the data, rather than fragmentation fear.

What do you think? What might make it easier to get over fragmentation fear? Would more/better indexing options help (I noticed Oracle has at least four index clustering structures to SQL Server's one)?

Cheers,

Tony.

Clustered Indexes? Sedimentary, my dear Watson

by Tony Davis

SQLServerCentral.com

Editorial

There is much sound advice suggesting that every table should have a clustered index, and that narrow, integer, ever-increasing columns, such as afforded by an IDENTITY column are the best choice. But is the sedimentary approach really the natural order of the day?