Just for the record, this happens to be one of my favorite interview questions to ask candidates.
At some point in time, there will be a database containing tables without clustered indexes (a heap) that you will be responsible for maintaining. I personally believe that every table should have a clustered index, but sometimes my advice is not always followed. Additionally there can be databases from a 3rd party vendor that have this same design. Depending on the what those heap tables are used for, over time it's possible they can become highly fragmented and degrade query performance. A fragmented heap is just as bad as a fragmented index. To resolve this issue, I'd like to cover four ways we can defragment a heap.
To start with, we will need a sample database with a highly fragmented heap table. You can download the FRAG database (SQL2012) from here
. Let's use the sys.dm_db_index_physical_stats DMV to check the fragmentation level.
,avg_fragmentation_in_percent FROM sys.dm_db_index_physical_stats(
As you can see, the heap is 93% fragmented, and both non-clustered indexes are 99% fragmented. So now we know what we're dealing with.
Repair options in order of my preference:
- ALTER TABLE...REBUILD (SQL 2008+).
- CREATE CLUSTERED INDEX, then DROP INDEX.
- CREATE TABLE temp, INSERT INTO temp, DROP original table, sp_rename temp to original, recreate the non-clustered indexes.
- BCP out all data, drop the table, recreate the table, bcp data in, recreate the non-clustered indexes.
Option 1 is the easiest and the most optimal way to remove heap fragmentation; however, this option was only introduced in SQL Server 2008, so it's not available for all versions. This is a single command that will rebuild the table and any associated indexes; yes, even clustered indexes. Keep in mind, this command will rebuild the heap as well as all of the non-clustered indexes.
ALTER TABLE dbo.MyTable REBUILD;
Option 2 is almost as quick, but involves a little bit of planning. You will need to select a column to create the clustered index on, keeping in mind this will reorder the entire table by that key. Once the clustered index has been created, immediately drop it.
CREATE CLUSTERED INDEX cluIdx1 ON dbo.MyTable(col1);
DROP INDEX cluIdx1 ON dbo.MYTable;
Option 3 requires manually moving all data to a new temporary table. This option is an offline operation and should be done during off-hours. First you will need to create a new temporary table with the same structure as the heap, and then copy all rows to the new temporary table.
CREATE TABLE dbo.MyTable_Temp(col1 INT,col2 INT); SELECT * FROM dbo.MyTable;
Next, drop the old table, rename the temporary table to the original name, and then create the original non-clustered indexes.
EXEC sp_rename 'MyTable_Temp','MyTable'; CREATE NONCLUSTERED INDEX idx1 ON dbo.MyTable(col1); CREATE NONCLUSTERED INDEX idx2 ON dbo.MyTable(col2);
Option 4 is by far to the least efficient way to complete this task. Just like option 3, this option is an offline operation and should be done during off-hours. First we need to use the BCP utility to bulk copy out all of the data to a data file. Using BCP will require a format file to define the structure of what we're bulk copying. In this example, I am using an XML format file. More information on format files can be found here.
BCP FRAG.dbo.MyTable OUT D:\MyTable.dat -T -STRON\TEST1 -fD:\MyTableFormat.xml
Once that is complete, we need to drop and recreate the table.
CREATE TABLE dbo.MyTable(col1 INT,col2 INT);
Next, we need to use the BCP utility to bulk copy all of the data back into the table.
BCP FRAG.dbo.MyTable IN D:\MyTable.dat -T -STRON\TEST1 -fD:\MyTableFormat.xml
Finally, we can create the original non-clustered indexes.
CREATE NONCLUSTERED INDEX idx1 ON dbo.MyTable(col1); CREATE NONCLUSTERED INDEX idx2 ON dbo.MyTable(col2);
Options 1 and 2 do not require any downtime for the table; however, they will cause blocking during the rebuild stage. You can use the WITH ONLINE option but that will require enough free space in tempdb for the entire table. Both options 3 and 4 will require downtime and will potentially impact any foreign key constraints or other dependent objects. If you're running SQL Server 2008 or higher, I highly recommend using option 1.
As you've seen, there are multiple ways of dealing with heap fragmentation. However, the best way is to avoid heaps altogether in your database design.