Best choice for clustered index based on usage stats from DMV query?

Question

Best choice for clustered index based on usage stats from DMV query?

Marios Philippopoulos

SSC Guru

Points: 57190
More actions
March 5, 2009 at 12:24 pm

#201990

I have queried the sys.dm_db_index_usage_stats DMV (and related views) and have found that the clustered index on a 1,000,000-row table is subject to a lot of updates. Here are the specifics on the 3 existing indexes on that table:
IX_tblName
type: nonclustered
seeks: 189
scans: 2
lookups: 0
updates: 142
PK_tblName
type: clustered (PRIMARY KEY)
seeks: 0
scans: 27
lookups: 21
updates: 236
UC_tblName_col1
type: nonclustered (unique constraint)
seeks: 31
scans: 131
lookups: 21
updates: 48
Based on the above info, should I be making the last index, UC_tblName_col1, clustered?
It looks like the 2nd one, PK_tblName, is not a good choice for a clustered index.
__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 1

Marios Philippopoulos (3/5/2009)
Based on the above info, should I be making the last index, UC_tblName_col1, clustered?

I would not look at usage to decide on a clustered index. There are other considerations, more important than how many scans happened.

I've created clusters before that would never be queried so that I could have a narrow, increasing clustering key to organise the table well and to keep the NC indexes as small as possible.

That aside, I certainly would not use the number of scans as an indicator for converting a NC index to a cluster. Scans of a nonclustered index can be fairly efficient. A NC index is usually much, much smaller than the table (the index pages only contain the index key and the clustering key). A clustered index, however is the table and a scan of the clustered index is a table scan.

The clustered index will always be subject to a lot of updates. It is the table, the data pages are the leaf pages of the clustered index so every single update done to a table updates the clustered index.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 2

GilaMonster (3/5/2009)
Marios Philippopoulos (3/5/2009)
Based on the above info, should I be making the last index, UC_tblName_col1, clustered?
I would not look at usage to decide on a clustered index. There are other considerations, more important than how many scans happened.
I've created clusters before that would never be queried so that I could have a narrow, increasing clustering key to organise the table well and to keep the NC indexes as small as possible.
That aside, I certainly would not use the number of scans as an indicator for converting a NC index to a cluster. Scans of a nonclustered index can be fairly efficient. A NC index is usually much, much smaller than the table (the index pages only contain the index key and the clustering key). A clustered index, however is the table and a scan of the clustered index is a table scan.
The clustered index will always be subject to a lot of updates. It is the table, the data pages are the leaf pages of the clustered index so every single update done to a table updates the clustered index.

I think usage is an important consideration, especially when the table is large.

In this particular case, the PK is actually a composite of the ID column of the table and a FK ID column from another table.

The UC index (3rd index), on the other hand, consists of just the ID column, so it seems it would be a better choice for the clustered index, because it is narrower and monotonically increasing.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 3

[font="Verdana"]One of the tricks of clustered indexes is that they define the physical allocation of the table data. So your clustered index should be the best way you want to organise the data, and not necessarily a good query index.

For that reason, an ID field is normally the best choice for a clustering index.

[/font]

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 4

Bruce W Cassidy (3/5/2009)
[font="Verdana"]One of the tricks of clustered indexes is that they define the physical allocation of the table data. So your clustered index should be the best way you want to organise the data, and not necessarily a good query index.[/font]

True, but the two do not necessarily need to be contradictory. A clustered index configured in such a way as to "best" organize the table data would also result in most range queries, retrieving large numbers of rows, being best optimized.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 5

Marios Philippopoulos (3/5/2009)
Bruce W Cassidy (3/5/2009)
[font="Verdana"]One of the tricks of clustered indexes is that they define the physical allocation of the table data. So your clustered index should be the best way you want to organise the data, and not necessarily a good query index.[/font]
True, but the two do not necessarily need to be contradictory. A clustered index configured in such a way as to "best" organize the table data would also result in most range queries, retrieving large numbers of rows, being best optimized.

[font="Verdana"]Agreed. How often do you do range queries on IDs though? Generally (in my experience) range queries result from dates and (also from my experience) dates aren't good candidates for clustered indexes as they require a "uniquifier".[/font]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 6

Marios Philippopoulos (3/5/2009)
I think usage is an important consideration, especially when the table is large.

Then we'll have to agree to disagree on that. I prefer, especially on larger tables, to use the cluster to organise and NC indexes for queries. If the cluster happens to be useful, so much the better.

The UC index (3rd index), on the other hand, consists of just the ID column, so it seems it would be a better choice for the clustered index, because it is narrower and monotonically increasing.

In this case, yes it sounds like it, but that's because of the setup of the column, not the usage of it. The wider one would still be monotonically increasing, since the leading column is (I presume) and identity.

If that index is on an identity, why is it getting updated at all?

A clustered index configured in such a way as to "best" organize the table data would also result in most range queries, retrieving large numbers of rows, being best optimized.

Not necessarily. A common query on customer records is surname LIKE @Param +'%', which is a range query, but that doesn't mean that ca cluster on surname is a good idea.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 7

Bruce W Cassidy (3/5/2009)[font="Verdana"]Agreed. How often do you do range queries on IDs though? Generally (in my experience) range queries result from dates and (also from my experience) dates aren't good candidates for clustered indexes as they require a "uniquifier".[/font]

Actually, ID columns can be used a lot in queries. Not in the WHERE clause of course, but, certainly when they are in join clauses, if they are defined as part of PK/FK constraints between tables.

I always thought dates were good candidates for a clustered index.

What is this about a uniquifier, can you pls elaborate?

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 8

Bruce W Cassidy (3/5/2009)
(also from my experience) dates aren't good candidates for clustered indexes as they require a "uniquifier".

They can be very good candidates for clusters, especially a datetime that defaults to GETDATE() and is never changed. No, not unique, but close enough. You can add an identity second column to force unique, or just let SQL create the uniquifier when there are duplicate values.

That said, they can be very bad candidates for clusters too, in the case of historical dates or dates that get updated whenever the row changes.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 9

GilaMonster (3/5/2009) Not necessarily. A common query on customer records is surname LIKE @Param +'%', which is a range query, but that doesn't mean that a cluster on surname is a good idea.

Good point.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 10

Marios Philippopoulos (3/5/2009)
What is this about a uniquifier, can you pls elaborate?

http://sqlinthewild.co.za/index.php/2009/02/09/all-indexes-are-unique/

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 11

[font="Verdana"]Okay, going from memory here. You can check Books Online for a fuller discussion, and someone please correct me if I'm wrong.

My understanding is that clustered indexes need to be unique as they are used for row storage and hence have to allocate a unique space for each row.

So if you have a non-unique clustered index (such as on a date field), SQL Server adds a "uniquifier" to the row to guarantee that the storage is unique. You don't see this value, but it's there as part of the storage allocation.

That's why I prefer identities for clustered indexes, because they are narrower than dates, and are guaranteed to be unique within themselves.

[/font]

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 12

Marios Philippopoulos (3/5/2009)
Actually, ID columns can be used a lot in queries. Not in the WHERE clause of course, but, certainly when they are in join clauses.

[font="Verdana"]Er... really? Can you show me an example of SQL Server generating a range query as part of a join? I would have expected a merge or nested loop or the like.[/font]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 13

Bruce W Cassidy (3/5/2009)
So if you have a non-unique clustered index (such as on a date field), SQL Server adds a "uniquifier" to the row to guarantee that the storage is unique. You don't see this value, but it's there as part of the storage allocation.

Not every row. Only the second/third/forth/... instance of a value

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Bruce W Cassidy SSChampion Points: 12668 More actions · Answer 14

GilaMonster (3/5/2009)
Bruce W Cassidy (3/5/2009)
(also from my experience) dates aren't good candidates for clustered indexes as they require a "uniquifier".
They can be very good candidates for clusters, especially a datetime that defaults to GETDATE() and is never changed. No, not unique, but close enough. You can add an identity second column to force unique, or just let SQL create the uniquifier when there are duplicate values.
That said, they can be very bad candidates for clusters too, in the case of historical dates or dates that get updated whenever the row changes.

[font="Verdana"]The first case would be true for a logging table (as an example), and yes, I would consider a date field for clustering in that case. But I don't see any advantage for using the date over an ID field, so I still tend to use an ID field. Perhaps if you were combining the "range scan by date" index with the clustered index?

As for the second case... yeah, I have seen lots of those! Hence my aversion to using date fields for clustered index candidates.

[/font]