SQLServerCentral Article

Index Utilization Trending

,

Problem

Introduced in SQL Server 2005, sys.dm_db_index_usage_stats can be queried to view aggregate read/write information for all indexes in a database.  This query is a basic look at what this DMV has to offer:

SELECT
       GETDATE() AS current_date_time,
       'AdventureWorks' AS [database_name], -- Replace with DB_NAME() or the name of the current database
       DB_TABLES.name AS [table_name],
       DB_INDEXES.name AS [index_name],
       DB_INDEX_USAGE_STATS.user_seeks,
       DB_INDEX_USAGE_STATS.user_scans,
       DB_INDEX_USAGE_STATS.user_lookups,
       DB_INDEX_USAGE_STATS.user_updates,
       DB_INDEX_USAGE_STATS.last_user_seek,
       DB_INDEX_USAGE_STATS.last_user_scan,
       DB_INDEX_USAGE_STATS.last_user_lookup,
       DB_INDEX_USAGE_STATS.last_user_update
FROM sys.dm_db_index_usage_stats DB_INDEX_USAGE_STATS
INNER JOIN sys.indexes DB_INDEXES
ON DB_INDEXES.object_id = DB_INDEX_USAGE_STATS.object_id
AND DB_INDEXES.index_id = DB_INDEX_USAGE_STATS.index_id
LEFT JOIN sys.tables DB_TABLES
ON DB_TABLES.object_id = DB_INDEXES.object_id
WHERE DB_INDEX_USAGE_STATS.database_id = (SELECT DB_ID('AdventureWorks')) -- Replace with DB_ID() or ID of current database

In addition, we can take a look at missing index data that is collected by SQL Server.  This data requires careful review as most indexes recommended by SQL Server are probably not good indexes to add.  Despite that, this information gives us valuable insight into columns where we may lack necessary indexing, queries that are resulting in frequent table scans, or existing indexes that could be altered to improve performance:

SELECT
              ROW_NUMBER() OVER (ORDER BY INDEX_GROUP_STATS.avg_user_impact) AS id,
              INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 )
        * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) AS improvement_measure,
              INDEX_DETAILS.statement
        + ' (' + ISNULL(INDEX_DETAILS.equality_columns, '')
        + CASE WHEN INDEX_DETAILS.equality_columns IS NOT NULL
                    AND INDEX_DETAILS.inequality_columns IS NOT NULL THEN ','
               ELSE ''
          END + ISNULL(INDEX_DETAILS.inequality_columns, '') + ')' + ISNULL(' INCLUDE (' + INDEX_DETAILS.included_columns + ')', '') AS index_create_statement,
        INDEX_GROUP_STATS.unique_compiles AS benefiting_compiles,
        INDEX_GROUP_STATS.user_seeks AS user_seeks,
        INDEX_GROUP_STATS.last_user_seek AS last_user_seek,
        INDEX_GROUP_STATS.avg_total_user_cost AS average_total_user_cost,
        INDEX_GROUP_STATS.avg_user_impact AS average_user_impact,
        SD.name AS database_name,
        REVERSE(SUBSTRING(REVERSE(INDEX_DETAILS.statement), 2, (CHARINDEX('[', REVERSE(INDEX_DETAILS.statement), 2)) - 2)) AS table_name,
        ISNULL((LEN(INDEX_DETAILS.equality_columns) -  LEN(REPLACE(REPLACE(INDEX_DETAILS.equality_columns, '[', ''), ']', ''))) / 2, 0) AS equality_columns,
        ISNULL((LEN(INDEX_DETAILS.inequality_columns) -  LEN(REPLACE(REPLACE(INDEX_DETAILS.inequality_columns, '[', ''), ']', ''))) / 2, 0) AS inequality_columns,
        ISNULL((LEN(INDEX_DETAILS.included_columns) -  LEN(REPLACE(REPLACE(INDEX_DETAILS.included_columns, '[', ''), ']', ''))) / 2, 0) AS included_columns
FROM    sys.dm_db_missing_index_groups INDEX_GROUPS
        INNER JOIN sys.dm_db_missing_index_group_stats INDEX_GROUP_STATS
        ON INDEX_GROUP_STATS.group_handle = INDEX_GROUPS.index_group_handle
        INNER JOIN sys.dm_db_missing_index_details INDEX_DETAILS
        ON INDEX_GROUPS.index_handle = INDEX_DETAILS.index_handle
        INNER JOIN sys.databases SD
        ON SD.database_id = INDEX_DETAILS.database_id
WHERE   INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 )
        * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) > 10
ORDER BY INDEX_GROUP_STATS.avg_total_user_cost * INDEX_GROUP_STATS.avg_user_impact * (INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans) DESC

The trouble with this data is that it is accumulated since SQL Server last restarted.  If that was last week, then the data may be useful, but if the last restart was a year ago, then it is mostly meaningless.  The best use of this data would be to track index utilization regularly in order to identify changes.  New reports, software releases, customers, or internal hardware/software changes could all affect usage and knowing immediately how these changes affect our environment will provide a clear path to optimization.

Solution

In order to truly take control of this data, we will need to persist it somewhere on our SQL Server.  Weekly data collection would be the most common use case, though daily could also be handy when frequent and significant change is occurring (or is the norm).

Let’s start by creating a table to store this data:

CREATE TABLE dbo.db_index_usage_stats
(
       sample_date DATETIME,
       database_name VARCHAR(250),
       table_name VARCHAR(250),
       index_name VARCHAR(250) ,
       user_seeks BIGINT,
       user_scans BIGINT,
       user_lookups BIGINT,
       user_updates BIGINT,
       last_user_seek DATETIME,
       last_user_scan DATETIME,
       last_user_lookup DATETIME,
       last_user_update DATETIME
)

If you plan on saving this data for a long period of time, then you’ll want to consider indexes on sample_date, database_name, table_name, or index_name to speed up read access.  Disk usage by this data is relatively small, but can add up over a long period of time as well.

Now we need to create a SQL script that can be run regularly to collect and put our missing index data into this table.  If only a single database is involved, then the query is relatively simple:

INSERT INTO dbo.db_index_usage_stats
              (   sample_date ,
                  database_name ,
                  table_name ,
                  index_name ,
                  user_seeks ,
                  user_scans ,
                  user_lookups ,
                  user_updates ,
                  last_user_seek ,
                  last_user_scan ,
                  last_user_lookup ,
                  last_user_update
              )
SELECT
       GETDATE(),
       'AdventureWorks' AS [database_name], -- Replace with DB_NAME() or the name of the current database
       ST.name AS [table_name],
       SI.name AS [index_name],
       IUS.user_seeks,
       IUS.user_scans,
       IUS.user_lookups,
       IUS.user_updates,
       IUS.last_user_seek,
       IUS.last_user_scan,
       IUS.last_user_lookup,
       IUS.last_user_update
FROM sys.dm_db_index_usage_stats IUS
INNER JOIN sys.indexes SI
ON SI.object_id = IUS.object_id
AND SI.index_id = IUS.index_id
LEFT JOIN sys.tables ST
ON ST.object_id = SI.object_id
WHERE IUS.database_id = (SELECT DB_ID('AdventureWorks')) -- Replace with DB_ID() or ID of current database

Now we have a table with data on any given index for different collection dates.  Since the values are aggregates since SQL Server last restarted, we can find the change over time by calculating the difference between 2 values.  For example, if we want to know the number of scans on an index during the month of April, we would subtract the scan count at the start of the month from the count at the end of the month.  If you do not want to maintain data forever, then add in a DELETE such as this at the end of your job:

DELETE FROM dbo.index_usage_aggregate_stats
WHERE DATEADD(DAY, 180, sample_date) < GETDATE() -- Only keep 180 days of data

The previous example only collected data on a single database, but what if we want index stats on all user databases?  The table created above will stay the same, but we will need to change how we collect the data to iterate through each database one at a time using dynamic SQL:

DECLARE @CMD VARCHAR(MAX) -- Will be used for dynamic SQL
DECLARE @database VARCHAR(100) -- Will hold name of current database being read
DECLARE @current_time DATETIME = GETDATE() /* Cache this date/time now so that we have the same value entered for all databases */DECLARE @database_count INT -- Stores total number of user databases on this instance
DECLARE @count INT = 1 -- Used in loop to keep track of current database
SELECT
       ROW_NUMBER() OVER (ORDER BY name) AS id,
       SD.name
INTO #databases
FROM sys.databases SD
WHERE SD.name NOT IN
(
       'master',
       'tempdb',
       'model',
       'msdb'
)
SELECT @database_count = COUNT(*)
FROM #databases
WHILE (@database_count > 0) AND (@count <= @database_count) -- Loop until we hit all databases
BEGIN
       SELECT @database = name
       FROM #databases WHERE id = @count
       SET @CMD = 'USE ' + @database + '
       INSERT INTO AdventureWorks.dbo.db_index_usage_stats /* Replace AdventureWorks with the name of the database where your index data resides */                  (   sample_date ,
                      database_name ,
                      table_name ,
                      index_name ,
                      user_seeks ,
                      user_scans ,
                      user_lookups ,
                      user_updates ,
                      last_user_seek ,
                      last_user_scan ,
                      last_user_lookup ,
                      last_user_update
                  )
       SELECT
              ''' + CAST(@current_time AS VARCHAR(25)) + ''' AS sample_date,
              '''+ @database  + ''' AS database_name,
              DB_TABLES.name AS table_name,
              DB_INDEXES.name AS index_name,
              INDEX_USAGE_STATS.user_seeks,
              INDEX_USAGE_STATS.user_scans,
              INDEX_USAGE_STATS.user_lookups,
              INDEX_USAGE_STATS.user_updates,
              INDEX_USAGE_STATS.last_user_seek,
              INDEX_USAGE_STATS.last_user_scan,
              INDEX_USAGE_STATS.last_user_lookup,
              INDEX_USAGE_STATS.last_user_update
       FROM ' + @database + '.sys.dm_db_index_usage_stats INDEX_USAGE_STATS
       INNER JOIN ' + @database + '.sys.indexes DB_INDEXES
       ON DB_INDEXES.object_id = INDEX_USAGE_STATS.object_id
       AND DB_INDEXES.index_id = INDEX_USAGE_STATS.index_id
       LEFT JOIN ' + @database + '.sys.tables DB_TABLES
       ON DB_TABLES.object_id = DB_INDEXES.object_id
       WHERE INDEX_USAGE_STATS.database_id = (SELECT DB_ID(''' + @database + '''))
       '
       EXEC (@CMD)
       SET @count = @count + 1 -- increment our count so that we go on to the next database
END
DROP TABLE #databases

While index usage statistics are stored for all databases in a single view, index and table data are maintained separately, so in order to collect all of the data we want, we need to loop through each database to complete our data set.  With this data, we can compare index usage between different databases.  Seeing how real usage differs between development and production or between databases with like schemas can allow us to compare how each database is used and potentially prepare for changes before they occur in production.

This process can be repeated for missing index statistics by replacing the SELECT portion of our INSERT statement with our missing index query from above:

IF NOT EXISTS (SELECT * FROM sys.objects WHERE type = 'U' AND name = 'db_missing_index_usage_stats')
BEGIN
       CREATE TABLE AdventureWorks.dbo.db_missing_index_usage_stats
       (   -- Create new table for our missing index data
              sample_date DATETIME,
              database_name VARCHAR(250),
              table_name NVARCHAR(4000),
              improvement_measure FLOAT,
              index_name NVARCHAR(4000),
              benefiting_compiles BIGINT,
              user_seeks BIGINT,
              last_user_seek DATETIME,
              average_total_user_cost FLOAT,
              average_user_impact FLOAT,
              equality_columns NVARCHAR(4000),
              inequality_columns NVARCHAR(4000),
              included_columns NVARCHAR(4000)
       )
END
DECLARE @CMD VARCHAR(MAX) -- Will be used for dynamic SQL
DECLARE @database VARCHAR(100) -- Will hold name of current database being read
DECLARE @current_time DATETIME = GETDATE() -- Cache this date/time now so that we have the same value entered for all databases
DECLARE @database_count INT -- Stores total number of user databases on this instance
DECLARE @count INT = 1 -- Used in loop to keep track of current database
SELECT
       ROW_NUMBER() OVER (ORDER BY name) AS id,
       SD.name
INTO #databases
FROM sys.databases SD
WHERE SD.name NOT IN
(
       'master',
       'tempdb',
       'model',
       'msdb'
)
SELECT @database_count = COUNT(*)
FROM #databases
WHILE (@database_count > 0) AND (@count <= @database_count) -- Loop until we hit all databases
BEGIN
       SELECT @database = name
       FROM #databases WHERE id = @count
       SET @CMD = 'USE ' + @database + '
       INSERT INTO AdventureWorks.dbo.db_missing_index_usage_stats -- Replace AdventureWorks with the name of the database where your index data resides
                     (     sample_date ,
                           database_name ,
                           table_name ,
                           improvement_measure ,
                           index_name ,
                           benefiting_compiles ,
                           user_seeks ,
                           last_user_seek ,
                           average_total_user_cost ,
                           average_user_impact ,
                           equality_columns ,
                           inequality_columns ,
                           included_columns
                  )
       SELECT
              ''' + CAST(@current_time AS VARCHAR(25)) + ''' AS sample_date,
              '''+ @database  + ''' AS database_name,
              REVERSE(SUBSTRING(REVERSE(INDEX_DETAILS.statement), 2, (CHARINDEX(''['', REVERSE(INDEX_DETAILS.statement), 2)) - 2)) AS table_name,
              INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 )
              * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) AS improvement_measure,
              INDEX_DETAILS.statement
              + '' ('' + ISNULL(INDEX_DETAILS.equality_columns, '''')
              + CASE WHEN INDEX_DETAILS.equality_columns IS NOT NULL
                                  AND INDEX_DETAILS.inequality_columns IS NOT NULL THEN '',''
                           ELSE ''''
                END + ISNULL(INDEX_DETAILS.inequality_columns, '''') + '')'' + ISNULL('' INCLUDE ('' + INDEX_DETAILS.included_columns + '')'', '''') AS index_name,
              INDEX_GROUP_STATS.unique_compiles AS benefiting_compiles,
              INDEX_GROUP_STATS.user_seeks AS user_seeks,
              INDEX_GROUP_STATS.last_user_seek AS last_user_seek,
              INDEX_GROUP_STATS.avg_total_user_cost AS average_total_user_cost,
              INDEX_GROUP_STATS.avg_user_impact AS average_user_impact,
              ISNULL((LEN(INDEX_DETAILS.equality_columns) -  LEN(REPLACE(REPLACE(INDEX_DETAILS.equality_columns, ''['', ''''), '']'', ''''))) / 2, 0) AS equality_columns,
              ISNULL((LEN(INDEX_DETAILS.inequality_columns) -  LEN(REPLACE(REPLACE(INDEX_DETAILS.inequality_columns, ''['', ''''), '']'', ''''))) / 2, 0) AS inequality_columns,
              ISNULL((LEN(INDEX_DETAILS.included_columns) -  LEN(REPLACE(REPLACE(INDEX_DETAILS.included_columns, ''['', ''''), '']'', ''''))) / 2, 0) AS included_columns
              FROM    sys.dm_db_missing_index_groups INDEX_GROUPS
              INNER JOIN sys.dm_db_missing_index_group_stats INDEX_GROUP_STATS
              ON INDEX_GROUP_STATS.group_handle = INDEX_GROUPS.index_group_handle
              INNER JOIN sys.dm_db_missing_index_details INDEX_DETAILS
              ON INDEX_GROUPS.index_handle = INDEX_DETAILS.index_handle
              INNER JOIN sys.databases SD
              ON SD.database_id = INDEX_DETAILS.database_id
       WHERE   INDEX_GROUP_STATS.avg_total_user_cost * ( INDEX_GROUP_STATS.avg_user_impact / 100.0 )
              * ( INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans ) > 10
       AND SD.name = ''' + @database + '''
       ORDER BY INDEX_GROUP_STATS.avg_total_user_cost * INDEX_GROUP_STATS.avg_user_impact * (INDEX_GROUP_STATS.user_seeks + INDEX_GROUP_STATS.user_scans) DESC
       '
       EXEC (@CMD)
       SET @count = @count + 1 -- increment our count so that we go on to the next database
END
DROP TABLE #databases

Conclusion

Collecting metrics for a database environment on a regular basis can greatly improve our ability to troubleshoot problems and more importantly, prevent performance issues before they become emergencies.  Having reliable data on your index usage will provide a deeper understanding of how your application works and ways in which usage changes over time.  This can easily be accomplished by creating a table to store this information and writing a short script to capture this data on a regular basis.

The same procedure used here for index data can easily be applied to other SQL Server metrics of interest, such as CPU, wait stats, deadlocks, memory consumption, and more.

Rate

4.71 (17)

You rated this post out of 5. Change rating

Share

Share

Rate

4.71 (17)

You rated this post out of 5. Change rating