Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Clustered Indexes? Sedimentary, my dear Watson

By Tony Davis,

Surely one would base the choice of the clustering key for a clustered index on the needs of the most critical queries; the goal being to cluster on the key(s) that will gather together data that is most likely to be queried together, and to define the natural order of the data. However, in SQL Server, this advice seems to be heavily tempered by the need to consider index implementation issues, such as minimizing page splits, fragmentation and so on.

There is much sound advice on MSDN and elsewhere suggesting that every table should have a clustered index, and that narrow, integer, ever-increasing columns, such as afforded by an IDENTITY column or some other naturally increasing integer, make the best candidates for the clustering key.

If you use an ever-increasing value in clustered key column, each INSERT will be stored in the logical order dictated by the clustering key. Rows will therefore always be added to the end, making the scanning of the index faster and more efficient. Otherwise, data will be inserted randomly over the pages, leading to increased page splitting and fragmentation; and consequentially to higher IO. You will also help to minimize the cost of using the clustered key as the row identifier in any secondary, non-clustered indexes, by adding a clustered index on an IDENTITY column.

This advice comes from the viewpoint that the clustered index is there primarily to 'organize the table' from the point of view of the index itself, and its health, rather than the queries that run against it; the latter being best served by the non-clustered, covering, indexes. It's a viewpoint I understand, but find a little troubling.

If you create a clustering key on an IDENTITY column, all you're doing is ensuring that the next row to arrive is logically stored next to the previous one; in other words you're organising your data by time of arrival. This is nicknamed the 'sedimentary' approach. In the absence of deletes and updates, this is the same behaviour you'll get from a heap. If organizing your data by time-of-arrival is optimal for your critical queries, then all is well. However, if it is not, then one could argue that it hasn't really organized the data at all.

If you do have a lot of deletions, it's true that clustering on an IDENTITY-style key will minimize fragmentation as compared to using a more random key, but the fact is that you will still get it, because data goes in a specific order and so gaps don't get reused, so you will still need to rebuild the index on a regular basis.

Ultimately, the primary objective of a clustered index is, or should be, to organize the data in such a way that the values that are likely to be queried together are stored together. It imposes an order on the data that corresponds to the natural way that we sequence and arrange it. Some of the advice that I read leads me towards the conclusion that we are introducing a best practice for clustered indexes in SQL Server that merely compensates for an implementation problem, rather than reflecting the requirements of the data within the application.

I'd love to hear from the Database Weekly community on this issue. How do you create your clustered indexes? How often do you rely on an IDENTITY column for the clustering key?

Cheers,

Tony.

Total article views: 402 | Views in the last 30 days: 1
 
Related Articles
FORUM

Cluster Index to a New Column?

Should I add a cluster index to a new identity column on a table that already has a clustered index?...

FORUM

What column to use for Cluster index

What column to use for Cluster index

FORUM

clustered index

clustered index

FORUM

unique clustered index on 2 columns

unique clustered index on 2 columns

FORUM

Insert on a clustered index - is column ordering (ASC or DESC) important ?

insert clustered index column order asc desc

Tags
database weekly    
editorial    
indexing    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones