So I'm working with a company who recently decided to upgrade from their volunteer DBA (one could argue that it was a point upgrade, but let's not be haters). Although I wouldn't characterize the database situation as terrible there were / are some serious issues including the most common volunteer DBA mistake - ineffective indexing. (For a background discussion about indexing I refer the reader to my paper Understanding Indexes and the many other excellent papers available on the subject.)
This paper will discuss the process I use to identify and track index utilization statistics.
We have 15 databases with about 1,800 tables upon which 392 indexes are defined totaling about 375 GB of data supporting OLTP Internet applications. Of the 392 indexes 255 have 0 'user seeks' and 0 'user scans' (not a misprint - ZERO - Note to the conscientious reader: the servers had been up for months).
Ignorance is Bliss or A little Knowledge is a Dangerous Thing
Imagine my horror and dismay when on my first day I generated an Index Utilization Report reflecting the index usage statistics described above. Suppressing the urge to run from the building I decided to collect usage statistics for all indexes across all of our production servers for two weeks. At the conclusion of the data collection period I created twenty-eight color graphs with circles and arrows and a paragraph on the bottom of each one explaining the index statistics along with a list of indexes to be dropped.
I presented the collection of multi-color graphs and the list of proposed indexes to be dropped; amazingly, the responses were uniform: "we can't delete indexes it'll slow everything down." And, "just because it's zero now doesn't mean that it isn't going to be used in the future." So, I'm told not to drop the indexes but to continue to collect usage statistics for six months at which time we can "re-evaluate" the situation.
I like MDW and I've used other commercial tools; however, I inevitably return to PerfStats. PerfStats is a simple database that stores collected statistics from all of the servers I manage. PerfStats provides storage and analysis of SQL Profiler data, forecasting using collected server statistics, performance analysis using collected DMV data (please refer to my paper Performance Monitoring with Dynamic Management Views), and index usage analysis.
The index usage statistics are stored in the following table:
SET ANSI_NULLS ON
SET QUOTED_IDENTIFIER ON
CREATE TABLE [dbo].[ServerIndexStats](
[OBJECT NAME] [nvarchar](128),
[INDEX NAME] [nvarchar](128),
) ON [PRIMARY]
The PerfStats "client system" installed on each managed server consists of a set of stored procedures and jobs each locally collecting some interesting metric. The PerfStats "server system" resides on my administration server also consisting of a set of stored procedures and jobs; additionally, the server system contains several SSIS packages executed by SQL Server Agent (SSA) which consolidate data from all of the PerfStats client systems.
Throughout the day each client server's SSA executes jobs to collect various metrics on the client server. Among the jobs the following query using the DM_DB_INDEX_USAGE_STATS DMV collects index usage statistics every evening:
Insert into perfstats.dbo.indexstats
select OBJECT_NAME(S.[OBJECT_ID]) AS [OBJECT NAME],
I.[NAME] AS [INDEX NAME],
FROM SYS.DM_DB_INDEX_USAGE_STATS AS S
INNER JOIN SYS.INDEXES AS I
ON I.[OBJECT_ID] = S.[OBJECT_ID]
AND I.INDEX_ID = S.INDEX_ID
The index statistics are consolidated on the admin server by the following SSIS package which also consolidates the database file size and server hard-drive statistics:
I don't like to keep performance data on the managed servers; therefore, after the data is pulled the tables are truncated.
Once the index usage statistics are collected from the client systems the data may be analyzed using any of a number of methods. One of the queries I use is:
Select distinct[Object Name],[Index Name], min(user_seeks) AS
[Min Seeks],avg(user_seeks)AS [Average Seeks],max(user_seeks) AS
[Max Seeks],min(user_scans) AS [Min Scans],avg(user_scans) AS
[Average Scans],max(user_scans) AS [Max Scans],min(user_lookups) AS
[Min Lookups],avg(user_lookups) AS [Average Lookups],
max(user_lookups) AS [Max Lookups],min(user_updates) AS
[Min Updates],avg(user_updates) AS [Average Updates],
max(user_updates) AS [Max Updates], DateDiff(dd,min(insert_date),getDate()) AS
[Days of Data]
WHERE [Object Name] not like 'sys%' AND [Object Name] not like 'MS%'
GROUP BY [object name],[index name],[server]
User Seeks: the number of times the index has been used by a user query in a seek operation.
User Scans: the number of times the index has been used by scanning the leaf pages of the index for data.
User Lookup: for clustered indexes only, this is the number of times the index has been used in a "bookmark lookup" to fetch the full row.
The table below is an abbreviated listing of the collected index utilization statistics (I've removed the identifiable names to protect the guilty) the data speaks for itself:
I believe that the primary responsibilities of the DBA include ensuring the availability, safety, and consistency of the data in the database. In addition, the DBA must ensure that the database is performing optimally at all times this is especially true for OLTP systems where providing adequate response times to Internet users is of paramount importance. I would argue that, all things being equal, the most effective method of maximizing performance of a database is efficient index definition and proper index maintenance. The metrics collection method described in this paper allows me to proactively react to the ever-changing data in the database ensuring that the indexes support maximum performance.