Blog Post

Index Fragmentation

,

So you’ve setup a bunch of indexes in your database to tune the queries coming in. They’re all running much faster than before and you’re leaning back in your chair feeling pretty pleased with yourself. You’re done right?

Well…no. As data gets inserted into and deleted from tables within a database, the indexes on the tables become fragmented. Fragmentation is defined as:-

“Fragmentation exists when indexes have pages in which the logical ordering, based on the key value, does not match the physical ordering inside the data file.”

But does it matter that the logical ordering does not match the physical ordering? Microsoft published a white paper here which showed that with increased fragmentation, the time taken for each read from disk increased whilst the average size of each read decreased. I know it was performed using SQL Server 2000 but it’s still relevant. They summarise the results as:-

“The workload performance increase realized in the small-scale environment ranged from 60 percent at the low level of fragmentation to more than 460 percent at the highest level of fragmentation. The workload performance increased realized for the large-scale environment ranged from 13 percent at the low fragmentation level to 40 percent at the medium fragmentation level”

Note – This is the summary of the DSS workload test, complex queries against multiple levels of fragmentation

To summarise:-
Small-scale environment

After defragmenting the indexes an increase in performance of 60% was gained when there was a low level of fragmentation and there was a gain of 460% at a high level of fragmentation
Large-scale environment

After defragmenting the indexes an increase in performance of 13% was gained when there was a low level of fragmentation and there was a gain of 40% at a high level of fragmentation

Wow, removing the fragmentation in the indexes gave a performance increase between 13% and 460% (across the two environments).

The paper also discusses why the performance in the large-scale environment was less affected by fragmentation…basically the environment had a better IO subsystem…which means that throwing money at index fragmentation can appear to solve the problem but you are really just papering over the cracks.

But, what DBA doesn’t like improving the performance of a system without having to spend any money? “Look, I’ve improved the performance by X%!”. (cue smug face)

So a plan to reduce the fragmentation of the indexes needs to be implemented. But how do you approach this? Rebuild every index nightly? Will your system allow you to do that? What if you are running Standard Edition and you cannot rebuild indexes online?

The way I approach index maintenance is to customise for each database I manage. To do this, the indexes that require either rebuilds or reorganisation need identifying. Microsoft have provided the DMV sys.dm_db_index_physical_stats which will allow us to do so:-

SELECT
[schemas].name AS [SchemaName],
[objects].name AS [TableName], 
[indexes].name AS [IndexName],
[dm_db_index_physical_stats].index_depth,
[dm_db_index_physical_stats].avg_fragmentation_in_percent,
[dm_db_index_physical_stats].fragment_count,
[dm_db_index_physical_stats].avg_fragment_size_in_pages,
[dm_db_index_physical_stats].page_count,
GETUTCDATE() AS [CaptureDate]
FROM 
[sys].[dm_db_index_physical_stats](DB_ID(),NULL,NULL,NULL,'LIMITED' ) AS [dm_db_index_physical_stats]
INNER JOIN 
[sys].[objects] AS [objects] ON ([dm_db_index_physical_stats].object_id = [objects].object_id)
INNER JOIN 
[sys].[schemas] AS [schemas] ON ([objects].[schema_id] = [schemas].[schema_id])
INNER JOIN 
[sys].[indexes] AS [indexes] ON ([dm_db_index_physical_stats].object_id = [indexes].object_id
AND  [dm_db_index_physical_stats].index_id = [indexes].index_id )
WHERE 
[dm_db_index_physical_stats].index_type_desc <> 'HEAP'
AND 
[dm_db_index_physical_stats].avg_fragmentation_in_percent > 20;
GO

Note – this will pick up data on all indexes that are over 20% fragmented.

By adding GETUTCDATE() we can monitor and track the fragmentation over time, allowing us to identify which tables’ indexes are being fragmented the quickest. Nice and simple to setup, first create a table to hold the data:-

CREATE TABLE dbo.[IndexAnalysis]
(AnalysisIDINT IDENTITY(1,1) NOT NULL PRIMARY KEY,
SchemaNameSYSNAME,
TableNameSYSNAME,
IndexNameSYSNAME,
IndexDepthINT,
AvgFragmentationInPercentFLOAT,
FragmentCountBIGINT,
AvgFragmentSizeInPagesFLOAT,
PageCountBIGINT,
CaptureDateDATETIME);
GO

Then create a stored procedure to capture the data and insert it into the table:-

CREATE PROCEDURE dbo.[CaptureIndexFragmentation] AS
INSERT INTO dbo.[IndexAnalysis]
SELECT
[schemas].name AS [SchemaName],
[objects].name AS [TableName], 
[indexes].name AS [IndexName],
[dm_db_index_physical_stats].index_depth,
[dm_db_index_physical_stats].avg_fragmentation_in_percent,
[dm_db_index_physical_stats].fragment_count,
[dm_db_index_physical_stats].avg_fragment_size_in_pages,
[dm_db_index_physical_stats].page_count,
GETUTCDATE() AS [CaptureDate]
FROM 
[sys].[dm_db_index_physical_stats](DB_ID(),NULL,NULL,NULL,'LIMITED' ) AS [dm_db_index_physical_stats]
INNER JOIN 
[sys].[objects] AS [objects] ON ([dm_db_index_physical_stats].object_id = [objects].object_id)
INNER JOIN 
[sys].[schemas] AS [schemas] ON ([objects].[schema_id] = [schemas].[schema_id])
INNER JOIN 
[sys].[indexes] AS [indexes] ON ([dm_db_index_physical_stats].object_id = [indexes].object_id
AND  [dm_db_index_physical_stats].index_id = [indexes].index_id )
WHERE 
[dm_db_index_physical_stats].index_type_desc <> 'HEAP'
AND 
[dm_db_index_physical_stats].avg_fragmentation_in_percent > 20;
GO

This can then be run as a nightly job so that you can track the fragmentation of the indexes over time.

SELECT 
SchemaName AS [Schema Name],
TableName AS [Table Name],
IndexName AS [Index Name],
AvgFragmentationInPercent AS [Current % Avg Fragmentation],
LAG(AvgFragmentationInPercent,1) OVER (PARTITION BY TableName, IndexName ORDER BY CaptureDate) AS [Previous % Avg Fragmentation],
AvgFragmentationInPercent -
LAG(AvgFragmentationInPercent,1) OVER (PARTITION BY TableName, IndexName ORDER BY CaptureDate) AS [Difference],
CaptureDate AS [Capture Date]
FROM 
dbo.IndexAnalysis

Note – This SELECT statement uses the LAG() function which is only available in SQL Server 2012 onwards.

Once you have the data you can then decide how to approach the maintenance. You could use something like:-

SELECT    
'ALTER INDEX [' + IndexName + '] ON [' + SchemaName + '].[' + TableName +'] ' +
    (CASE
     WHEN (avgfragmentationinpercent >= 20 AND avgfragmentationinpercent < 40) THEN 'REORGANIZE'
     WHEN avgfragmentationinpercent > = 40 THEN 'REBUILD' END) AS [SQL Statement]
FROM 
dbo.[IndexAnalysis] 

This will generate the required ALTER INDEX statements, you could then use sp_executesql to perform the operation. Once you have the data, you can proceed with the best approach that suits your environment. One thing that I have found is that, with large indexes in a busy system, catching fragmentation early and performing frequent REORGANIZE operations instead of less frequent REBUILDs is preferable as REORGANIZE statements are less resource intensive.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating