Copy this script into a Query window and run on any SQL database to produce results of the Clustered Index Analyser. A bit of a run down is provided regarding analysising the results:
In my opinion as well as many other DBA's, the best practice for clustered indexes requires these 4 main points:
- Narrowness - Number of columns composing the index should be bare minimal with one column being ideal
- Uniqueness - Completely identifiable on its own
- Unchanging - Never requires modification
- Ever increasing - avoiding fragmentation
As also mentioned by Gail Shaw here http://www.sqlservercentral.com/articles/Indexing/68563/
Therefore at our work place we have generated a template that suggests every table have an integer identity column which is used for the Clustered index. It's usually named after the 'table name' + 'ID'. This helps enforce our standards within the development team.
Currently we are working on a database performance project where we're reviewing instances of our data model which are contributing to bad performance; badly designed clustered indexes are high on the list of things to be reviewed.
I have generated a script which predominately looks at the best practices of having narrow and unique clustered indexes. This script identifies each clustered index within a database, then displays each column that comprises that clustered index along with its data type and potential size in bytes (calculated by the data type size in bytes * maximum length of the column + additional cost e.g. variable column length has additional cost of 2 bytes). This calculation produces a potential size in bytes of each row of the clustered index. Ideally this size would be 4, which is what the clustered index per row size would be if it's an Integer identity column.
This script also selects the number of rows per table. The total number of rows value is multiplied by the index size in bytes per row to calculate the cost of the clustered index for the entire table in Kilobytes. This is a simplistic calculation regarding disk space as it doesn't take into account the several levels of nodes that comprise indexes, with a cost on disk space being on each level. Regardless, it is quick to see the overhead of having a clustered index on a large table that potentially uses many bytes per composition of the clustered index. It's also quick to see the benefits of modifying these clustered indexes which can be relatively quick fixes for massive gains in performance and disk space in the overall scheme of things.
Furthermore, a count of non-clustered indexes is also produced per table. This is important because each non-clustered index at its leaf node contains the clustered index, which is used as the pointer to the actual data. Therefore another simplistic calculation can be made of the (total clustered index size per table * the number of non-clustered indexes). Once again, without it being specific detail, it is easy to see how quickly a large clustered index can quickly blow out disk space on a large table with many non-clustered indexes.
Another best practice for a clustered index is uniqueness. If the clustered index isn't unique SQL makes it unique by adding a hidden 4 byte integer column to the clustered index; therefore also contributing to extra disk space and overheads. This script identifies which clustered indexes aren't unique, but does not contribute that additional cost in bytes that is created by the hidden column, as that cost is only generated per duplicate fields. As mentioned at this following URL http://sqlfool.com/2009/05/overhead-i-non-unique-clustered-indexes/
This script orders the clustered indexes in descending order after the calculation of (total clustered index size per table) * (number of non-clustered indexes if there is 1 or more non-clustered indexes on the table). But manipulate the script in whatever way you deem fit to get the output you're looking for.
Disk space is not the priority improvement we are looking for, but is a nice outcome. Database performance is greatly improved by keeping the clustered index narrow and unique which is what this script is analysing, while also listing indexes in order of most benefit gained if improving the clustered index.
Any User Table that does not have a clustered index is also returned as a separate selection. Every table should have a clustered index!!!
Depending on how well you're databases clustered indexes currently are, correcting the top 10 clustered indexes returned by this script should greatly improve overall performance for your database environment. Hopefully this can assist you in analysing your clustered indexes as much as it has helped us.
A special thanks goes out to Ben Cruickshank for continually reviewing my scripts. Thanks Mate!
************************************************ UPDATE *****************************************************
- Included schema
- Improved Table Row Count Query (Dramatically Improved performance of script) Thanks Richard Doering
- Using SYSNAME Data Type for Object names (Columns, Schema, Tables, Index) which allows for special characters i.e. Chinese characters because it's an NVARCHAR type
- Using QUOTENAME() around object names to handle spaces, dots and other special characters in object names