Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Clustered Indexes? Sedimentary, my dear Watson Expand / Collapse
Author
Message
Posted Saturday, May 22, 2010 11:08 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: Administrators
Last Login: Monday, October 20, 2014 1:22 PM
Points: 569, Visits: 1,033
Comments posted to this topic are about the item Clustered Indexes? Sedimentary, my dear Watson
Post #926440
Posted Saturday, May 22, 2010 11:29 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:38 PM
Points: 35,371, Visits: 31,912
Heh... It certainly "depends" but try rebuilding indexes without a clustered index and see what happens. Keep in mind that indexes also "split" except they frequently do so by extent rather than by page.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #926446
Posted Saturday, May 22, 2010 11:55 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 5, 2012 1:41 AM
Points: 155, Visits: 146
Tony,

Awesome point that I think too few people consider (or even know to consider). I always think about this when creating a new table, but I don't always know exactly how that table will be used in the "wild". I've even re-clustered tables upon seeing how they're typically queried [sp?].

Even when I recluster, I recognize that there will be a price to pay with inserts. I personally haven't dug deeply enough to know when that price is worse than the SELECT improvements. I generally just consider the situation -- more INSERTs or more SELECTS?

Unfortunately, I work with a crew that feels that every table should have a clustered IDENTITY index that "resolves any abiguity" -- even though 90% of the time, the tail end of a relation will NEVER query that relational ID!

In general, our DB's are small enough that that is an OK approach. HOWEVER, we have one MASSIVE DB that has benefitted from some reclustering, and could benefit more, but there is reistance due to the above beliefs. We'll see how that sorts out over time, as we get to the 5-6 million-plus records (which is fast approaching).

-----------

Wish I had a clever tagline here... D.

Jsut thinking -- on the tagline -- what do you all think of "Inspiriation / Perspiration -- smell is in the nose of the beholder" or something like that? Thanks! D.
Post #926488
Posted Sunday, May 23, 2010 2:38 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 8:19 AM
Points: 40,208, Visits: 36,617
Well I'm strongly in the 'clustered index to organise the table' camp, and do have reasons.

Firstly, note that these reasons are for OLTP-type systems. Lots of changes, lots of fast, small queries. The indexing rules for decision support/OLAP can be very different because of the radically different usage pattern.

There can only be one clustered index, and it's going to be the largest index that exists for that table. Hence if it needs rebuilding, it will take the longest and have the greatest log impact so I want to minimise the need to rebuild
Because there can only be one clustered index, I personally prefer to use the cluster mostly to organise the table (hence the ever-increasing and non-changing attributes) and use the nonclustered indexes for queries.
If I can get the cluster so it's quite narrow, unique (or nearly so), ever increasing, non-changing and have it useful for queries as well, that's absolute first prize. My favourite here is cluster on a date inserted column where the table is often queried by date. Many transaction-type tables fit in this category.
If the system has lots and lots of inserts (OLTP), I need those inserts as fast as possible. Hence I do not want frequent page splits on the cluster, nor do I want data moving around in the table if the clustering key gets updated. Both of those, if occurring enough, can have nasty effects on insert performance.

Heaps have some interesting downsides. Until 2008, they couldn't be rebuilt, and there's occasionally odd behaviour around reusing of space within a heap. Plus those forwarding pointers.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #926492
Posted Sunday, May 23, 2010 2:41 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, August 1, 2013 3:16 AM
Points: 28, Visits: 52
Hi,
I think we should not generalize; it is purely on case to case basis, if entity poses ordered attribute and that attribute make sense to the underline domain then we don’t need Identity Column for Cluster Index (assume unique key length is not too large). But at the same time if entity poses time based attribute, then better choice is Identity column (Sedimentary) for Cluster Index.

E.g. – In Banking domain Savings Account Number attribute in Saving entity, here most of the cases data need to be fetch in Sing Account No order. So Saving Account No can use for Cluster index. Here again new account number not creating in sequence, but need to apply some prefix/suffix based on Saving Type or other attribute then we can again go back to Identity column fro Cluster Index.
At the same time Saving Transaction case data normally need to arrange based on time of transaction we can have Identity Column and same can use for Cluster Index.

thanks
Nirmal
Post #926493
Posted Sunday, May 23, 2010 2:45 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 14, 2014 10:20 AM
Points: 4, Visits: 69
Timely post for me. Two days ago, I was troubleshooting a deadlock issue and found that deadlocks were centered around updates to a table which had a 3 column clustered index and all 3 columns were being updated.

In a case where the clustered key is likely to change, I think it's a bad idea to cluster.

That led me to the conclusion that a particular unique composite key is ok to cluster if it rarely changes. In the absence of a suitable candidate, adding an Identity column or a uniqueidentifier with a default of newsequentialid() is my next choice for clustered index.

- Sam

Post #926560
Posted Sunday, May 23, 2010 7:07 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, December 4, 2012 2:00 AM
Points: 1, Visits: 33
I come from a data analyst background but this has got me into dba and database design work. My area of work is around demographic statistics so I've built up a database to store large census datasets and other administrative datasets. Each dataset usually has its own table however many tables, particularly the population table have multiple datasets.

Anyway, awhile ago I realised that my clustered identity key did and didn't make logical sense i.e. as per your article. It didn't sit well with me because to me the data should be grouped according to the dataset and then most by the various demographic fields i.e. age, gender etc...

But the reality is that I only upload datasets one at a time and always in a sort ordered manner. Therefore by coincidence my data was ordered logically anyway, however not by design but more by good fortune.

I've found that my queries are always by dataset so the first thing the query engine would do is pull out the range of ids that fit the dataset (I have non-clustered indexes to which I presume the range gets easily identified). After that it would do table joins by ID and then or possibly before the column filters get applied.

I realised then that the clustered identity column works for me in this instance however if my database wasn't so OLAP like it could be less than ideal. I love non-composite primary keys, they are just so easy for writing queries against but I couldn't put it over server performance.

Thanks for the article it is good to read about the pros and cons especially with regards to page splitting, and insert, updates and deletes.
Post #926583
Posted Sunday, May 23, 2010 11:45 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, December 21, 2012 1:46 AM
Points: 28, Visits: 59
What? NOBODY mentions the major reason why identity fields were frowned on? Has this reason disappeared? Each insert is going to the same page, at least until the page is filled and even in 2005 that will cause a page lock until the insert finishes. (No more inserts allowed until the transaction commits.) It caused the term "Hot Spot" to be invented. Also the problem with the old datetime format when trying to insert by date when you have 20 records waiting to insert in the same 3 millisecond gap. (When your key was just a datetime field)
Since when does sequential mean organizing data together? We've had two uniqueindex, clustered, primary key fields in one table. The first identifies the client (the value found in another table) The second is definitely unique in the table because it is always generated with a newid() value when inserted. This definitely organizes the data together because everything is retrieved for the client when a client shows activity. Now that's organizing data together using a completely random identification process.
I feel for the person with the deadlocking problem. That's an argument among a host of reasons why artificial keys like identity fields are better than natural keys. (Keys, whose values have intrinsic meanings.)
Post #926619
Posted Monday, May 24, 2010 12:29 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, July 25, 2014 5:21 AM
Points: 1,205, Visits: 923
I have a small 12 GB database to look after plus I am not a DBA since I am actually one of the developers that developed this database and I know by now how popular developers are amongst DBA's but I base my indexing on the following.
1. A primary key is the clustered index
2. All other indexes are nonclustered since you can anyway only have one clustered index.

The reason why is, as someone has said already, that with clustered indexes the data gets ordered according to the columns and therefore the data gets added that way in the index. Less scanning.

I use identity columns only when absolutely necessary.


Manie Verster
Developer
Johannesburg
South Africa

I can do all things through Christ who strengthens me. - Holy Bible
I am a man of fixed and unbending principles, the first of which is to be flexible at all times. - Everett Mckinley Dirkson (Well, I am trying. - Manie Verster)
Post #926629
Posted Monday, May 24, 2010 1:02 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, September 22, 2014 7:51 AM
Points: 24, Visits: 54
We manage a system where there are thousands of reads for every write operation so we use clustered indexes on the keys used to do the majority of joins (not always the primary key). Performance is woeful if we rely on normal indexes for these key join relationships. We also have the added issue that the majority of key values in the system carry additional meaning i.e. the range of numbers (INT) also implies business grouping ownership which means clustering also has a side effect of storing all the groups in an order - rather than being a burden, this happenstance appears to offer many benefits
Post #926642
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse