Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Clustered Indexes? Sedimentary, my dear Watson


Clustered Indexes? Sedimentary, my dear Watson

Author
Message
Tony Davis
Tony Davis
SSChasing Mays
SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)SSChasing Mays (637 reputation)

Group: Administrators
Points: 637 Visits: 1151
Comments posted to this topic are about the item Clustered Indexes? Sedimentary, my dear Watson
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)

Group: General Forum Members
Points: 52001 Visits: 40313
Heh... It certainly "depends" but try rebuilding indexes without a clustered index and see what happens. Keep in mind that indexes also "split" except they frequently do so by extent rather than by page.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
dhamilton-905368
dhamilton-905368
SSC-Enthusiastic
SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)SSC-Enthusiastic (155 reputation)

Group: General Forum Members
Points: 155 Visits: 148
Tony,

Awesome point that I think too few people consider (or even know to consider). I always think about this when creating a new table, but I don't always know exactly how that table will be used in the "wild". I've even re-clustered tables upon seeing how they're typically queried [sp?].

Even when I recluster, I recognize that there will be a price to pay with inserts. I personally haven't dug deeply enough to know when that price is worse than the SELECT improvements. I generally just consider the situation -- more INSERTs or more SELECTS?

Unfortunately, I work with a crew that feels that every table should have a clustered IDENTITY index that "resolves any abiguity" -- even though 90% of the time, the tail end of a relation will NEVER query that relational ID!

In general, our DB's are small enough that that is an OK approach. HOWEVER, we have one MASSIVE DB that has benefitted from some reclustering, and could benefit more, but there is reistance due to the above beliefs. We'll see how that sorts out over time, as we get to the 5-6 million-plus records (which is fast approaching).

-----------

Wish I had a clever tagline here... D.

Jsut thinking -- on the tagline -- what do you all think of "Inspiriation / Perspiration -- smell is in the nose of the beholder" or something like that? Thanks! D.
GilaMonster
GilaMonster
SSC Guru
SSC Guru (54K reputation)SSC Guru (54K reputation)SSC Guru (54K reputation)SSC Guru (54K reputation)SSC Guru (54K reputation)SSC Guru (54K reputation)SSC Guru (54K reputation)SSC Guru (54K reputation)

Group: General Forum Members
Points: 54700 Visits: 44660
Well I'm strongly in the 'clustered index to organise the table' camp, and do have reasons.

Firstly, note that these reasons are for OLTP-type systems. Lots of changes, lots of fast, small queries. The indexing rules for decision support/OLAP can be very different because of the radically different usage pattern.

There can only be one clustered index, and it's going to be the largest index that exists for that table. Hence if it needs rebuilding, it will take the longest and have the greatest log impact so I want to minimise the need to rebuild
Because there can only be one clustered index, I personally prefer to use the cluster mostly to organise the table (hence the ever-increasing and non-changing attributes) and use the nonclustered indexes for queries.
If I can get the cluster so it's quite narrow, unique (or nearly so), ever increasing, non-changing and have it useful for queries as well, that's absolute first prize. My favourite here is cluster on a date inserted column where the table is often queried by date. Many transaction-type tables fit in this category.
If the system has lots and lots of inserts (OLTP), I need those inserts as fast as possible. Hence I do not want frequent page splits on the cluster, nor do I want data moving around in the table if the clustering key gets updated. Both of those, if occurring enough, can have nasty effects on insert performance.

Heaps have some interesting downsides. Until 2008, they couldn't be rebuilt, and there's occasionally odd behaviour around reusing of space within a heap. Plus those forwarding pointers.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


pk_nirmal
pk_nirmal
SSC Rookie
SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)SSC Rookie (28 reputation)

Group: General Forum Members
Points: 28 Visits: 52
Hi,
I think we should not generalize; it is purely on case to case basis, if entity poses ordered attribute and that attribute make sense to the underline domain then we don’t need Identity Column for Cluster Index (assume unique key length is not too large). But at the same time if entity poses time based attribute, then better choice is Identity column (Sedimentary) for Cluster Index.

E.g. – In Banking domain Savings Account Number attribute in Saving entity, here most of the cases data need to be fetch in Sing Account No order. So Saving Account No can use for Cluster index. Here again new account number not creating in sequence, but need to apply some prefix/suffix based on Saving Type or other attribute then we can again go back to Identity column fro Cluster Index.
At the same time Saving Transaction case data normally need to arrange based on time of transaction we can have Identity Column and same can use for Cluster Index.

thanks
Nirmal
Sam Jumper
Sam Jumper
Forum Newbie
Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)Forum Newbie (4 reputation)

Group: General Forum Members
Points: 4 Visits: 78
Timely post for me. Two days ago, I was troubleshooting a deadlock issue and found that deadlocks were centered around updates to a table which had a 3 column clustered index and all 3 columns were being updated.

In a case where the clustered key is likely to change, I think it's a bad idea to cluster.

That led me to the conclusion that a particular unique composite key is ok to cluster if it rarely changes. In the absence of a suitable candidate, adding an Identity column or a uniqueidentifier with a default of newsequentialid() is my next choice for clustered index.

- Sam
stephen jones-415753
stephen jones-415753
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 33
I come from a data analyst background but this has got me into dba and database design work. My area of work is around demographic statistics so I've built up a database to store large census datasets and other administrative datasets. Each dataset usually has its own table however many tables, particularly the population table have multiple datasets.

Anyway, awhile ago I realised that my clustered identity key did and didn't make logical sense i.e. as per your article. It didn't sit well with me because to me the data should be grouped according to the dataset and then most by the various demographic fields i.e. age, gender etc...

But the reality is that I only upload datasets one at a time and always in a sort ordered manner. Therefore by coincidence my data was ordered logically anyway, however not by design but more by good fortune.

I've found that my queries are always by dataset so the first thing the query engine would do is pull out the range of ids that fit the dataset (I have non-clustered indexes to which I presume the range gets easily identified). After that it would do table joins by ID and then or possibly before the column filters get applied.

I realised then that the clustered identity column works for me in this instance however if my database wasn't so OLAP like it could be less than ideal. I love non-composite primary keys, they are just so easy for writing queries against but I couldn't put it over server performance.

Thanks for the article it is good to read about the pros and cons especially with regards to page splitting, and insert, updates and deletes.
Ken Lee-263418
Ken Lee-263418
SSC Rookie
SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)SSC Rookie (34 reputation)

Group: General Forum Members
Points: 34 Visits: 59
What? NOBODY mentions the major reason why identity fields were frowned on? Has this reason disappeared? Each insert is going to the same page, at least until the page is filled and even in 2005 that will cause a page lock until the insert finishes. (No more inserts allowed until the transaction commits.) It caused the term "Hot Spot" to be invented. Also the problem with the old datetime format when trying to insert by date when you have 20 records waiting to insert in the same 3 millisecond gap. (When your key was just a datetime field)
Since when does sequential mean organizing data together? We've had two uniqueindex, clustered, primary key fields in one table. The first identifies the client (the value found in another table) The second is definitely unique in the table because it is always generated with a newid() value when inserted. This definitely organizes the data together because everything is retrieved for the client when a client shows activity. Now that's organizing data together using a completely random identification process.
I feel for the person with the deadlocking problem. That's an argument among a host of reasons why artificial keys like identity fields are better than natural keys. (Keys, whose values have intrinsic meanings.)
Manie Verster
Manie Verster
Ten Centuries
Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)Ten Centuries (1.4K reputation)

Group: General Forum Members
Points: 1403 Visits: 1022
I have a small 12 GB database to look after plus I am not a DBA since I am actually one of the developers that developed this database and I know by now how popular developers are amongst DBA's but I base my indexing on the following.
1. A primary key is the clustered index
2. All other indexes are nonclustered since you can anyway only have one clustered index.

The reason why is, as someone has said already, that with clustered indexes the data gets ordered according to the columns and therefore the data gets added that way in the index. Less scanning.

I use identity columns only when absolutely necessary.:-D:-D:-D:-D:-D:-D

:-PManie Verster
Developer
Johannesburg
South Africa

I can do all things through Christ who strengthens me. - Holy Bible
I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)
Richard Williams-416852
Richard Williams-416852
SSC Rookie
SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)

Group: General Forum Members
Points: 32 Visits: 54
We manage a system where there are thousands of reads for every write operation so we use clustered indexes on the keys used to do the majority of joins (not always the primary key). Performance is woeful if we rely on normal indexes for these key join relationships. We also have the added issue that the majority of key values in the system carry additional meaning i.e. the range of numbers (INT) also implies business grouping ownership which means clustering also has a side effect of storing all the groups in an order - rather than being a burden, this happenstance appears to offer many benefits
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search