Even though SQL Server is very good at more or less automatically maintaining most statistics, physical data structuring and configurations etc (especially compared to some of it's competitors), there are still some situations where you need to perform a little manual work to help it. One of those areas is fragmentation.
Different types of fragmentation
Fragmentation occurs as data is modified. There are two major distinctions regarding fragmentation, disk fragmentation and SQL Server fragmentation.
Logical disk fragmentation
The database files can be logically fragmented in the file system just like any other operating system files. This occurs when the file system cannot create the file in one logically contigous space. The result of this is that the disk head must move back and forth over the disk when reading from the file. To remove this type of fragmentation you can use the defragmentation tools that are included in Windows or use some third-party application. Remember though that SQL Server must be stopped during the defragmentation process because when it is started the database files are in use by SQL Server, even if no user is actually using that specific database, and files in use can not be defragged.
Avoiding fragmentation of database files
To keep fragmentation in the database files at a minimum without defragging them you should avoid using the autogrow setting in SQL Server for them. Preallocate enough size for them from the start and when growth still is needed try to increase the size in large chunks at once. Also make sure you keep the database files on separate disks away from other files to make sure logical contigous space is available for them. Separating the data files from the log file is helpful and also important for performance in other ways, since the log file is written to sequentially.
SQL Server Index Fragmentation
The other major type of fragmentation, and the one that is the most interesting to us, is SQL Server fragmentation. This means that the data stored inside the database files is fragmented on an index level basis, regardless of whether the database file is logically fragmented or not. The easiest way to describe it in a couple of words it to say that it prevents SQL Server from being able to use indexes in an optimal way. Index fragmentation can be either internal or external. Note that external fragmentation is not the same thing as logical disk fragmentation, though the name might imply it. Each of these types are described in detail below.
Pages that have a lot of free space are said to be internally fragmented. This is caused by rows that are removed by DELETE statements or when pages are split and only filled to about half. Empty space on pages means there are less rows per page, which in turn means more page reads. Systems that are read-intensive can see a significant degrade in performance from these extra reads. Even for applications that are defined as OLTP (OnLine Transaction Processing) systems, i.e. write-intensive systems, the ratio between reads and writes is usually more than 80/20.
Sometimes necessary and wanted
However, page splits are very expensive operations. Sometimes the tradeoff of extra page reads during read operations for fewer page splits during write operations is correct. For systems that really are write-intensive, or perhaps specific operations that generate many writes, this might be necessary. There is really no correct advice regarding this, as it is completely different between different applications and systems.
Note: Even though heap table data pages can also become internally fragmented, we do not discuss those here since there is no way to defrag them. The only way to remove this is to copy the data from the table to a temporary storage, drop the table and then reinsert the data.
When tables grow as new data is inserted into them the ideal situation is that pages get allocated to an extent, then when that extent is filled up a new extent is allocated, then filled up with pages and so on. This way all the pages will always be located adjacent to eachother and readingeight pages will in a best case scenario only require reading one extent from disk.
Reality, however, is seldom ideal. New rows, when inserted, are bound to cause page splits sooner or later, as will updates that cause variable sized rows to grow too large to fit on the original page. When a new page is created from a page split it will be allocated in the same extent as the original page if there is room for it. If the extent is already full, which will normally be the case in a production system, a new extent will be allocated to the index and the new page will be placed there. The new page will not be contigous to the original page, so when following the page chain using the previous and next pointers in the page headers SQL Server will need to perform an extra extent switch. In a worst case scenario reading eight pages in order might require seven extent switches, and eight extents must be read from disk.
Ordered scans are affected
When reading individual rows external fragmentation will not affect the performance at all, and the same goes for unordered scans. The reason for the latter is that unordered scans can use the IAM pages to find which extents need to be fetched in a very efficient way. It is only for scans that need to fetch the pages in logical order that the page chain is used and external fragmentation might become a degrading factor for performance.
Viewing fragmentation information
To view how much fragmentation an index has you use DBCC SHOWCONTIG. It can show information for all indexes on a table or just a single index at a time. Using the option TABLERESULTS you get extra columns in the output that describe statistics about the index. By default DBCC SHOWCONTIG scans the page chain at the leaf level, but it is possible to scan all levels of an index.
When reviewing the output from DBCC SHOWCONTIG you should pay special attention to the following metrics:
- Avg. Page Density (full): Shows the average level of how filled the pages are. A percentage means the pages are almost full, and a low indicates much free space. This value should be compared to the fill factor setting specified when the index was created to decide whether or not the index is internally fragmented.
- Scan Density: Shows the ratio between the Best Count of extents that should be necessary to read when scanning all the pages of the index, and the Actual Count of extents that was read. This percentage should be as close to 100% as possible. Defining an acceptable level is difficult, but anything under 75% definitely indicates external fragmentation.
- Logical Scan Fragmentation: Shows the ratio of pages that are out of logical order. The value should be as close to 0% as possible and anything over 10% indicates external fragmentation.
Note: The value for Extent Scan Fragmentation is not really important. This is because the algorithm that calculates it does not work for indexes that span over multiple files. According to some Microsoft sources, in the next version of SQL Server, DBCC SHOWCONTIG will not even include this metric.
There are four different ways to defragment an index.
Drop and recreate index
Pros: Completely rebuilds the index. Reorders and compacts the index pages, and removes unnecessary pages.
Cons: Index is not available during process which means that queries will suffer. The process will block all queries accessing the index and also be blocked itself by other processes using the index, unable to do it’s work until those processes are finished. This option is especially bad for clustered indexes as dropping a clustered index means that all non-clustered indexes must be rebuilt (to use RID as bookmark), and then recreating the clustered index will once again force a rebuild of all non-clustered indexes.
CREATE INDEX ... WITH DROP_EXISTING
Pros: Rebuilds index in one step which is good for clustered indexes, as it means that non-clustered indexes do not need to be rebuilt. Can be used to recreate (and thereby defragment) indexes created by constraints, if the index definition matches the requirements of the constraint.
Cons: Potential blocking problems with other processes in the same way as for drop and recreate index above.
Pros: Does the best job of removing internal and external fragmentation since it physically assigns new pages to the index and moves data to them. Has the possibility to rebuild all indexes on a table using one statement. Recreates indexes for constraints without forcing you to know their requirements.
Cons: The same blocking issues as for the two above. Runs in a transaction so all work that has been done is rolled back if the command is stopped.
Pros: Reorders the leaf pages in index order – enhancing performance for scans especially – and compacts them using the setting specified for fill factor when the index was created. Empty pages are removed. This is an online operation, meaning it does not take long-term locks and thereby do not block other processes and do not get blocked itself by them. Work that has already been done will still be done if the command is cancelled.
Cons: The fact that this is an online operation can also have a negative impact, as this means that DBCC INDEXDEFRAG will simply skip pages that are used by other processes. Very fragmented indexes will take a long time to defragment, so they should probably be rebuilt instead if possible. Generates a lot of log activity.
As seen above, fragmentation can sometimes cause a lot of performance problems in your databases. It is therefore a good recommendation to set up a job that at least executes and stores the result of DBCC SHOWCONTIG at regular intervals. If you want to you can automate defragging as well, but then I would suggest it is scheduled for periods of little database usage, and that DBCC INDEXDEFRAG is used. One drawback there though is that the resultset of DBCC INDEXDEFRAG can not simply be stored in a table using syntax such as INSERT INTO foo (...) EXEC ('DBCC ...'), which is normally possible for other commands (such as DBCC SHOWCONTIG). But if it is executed from an 'outside' app using ADO or ADO.NET it is possible to catch the result set it returns and then store that result in a table.