Log in
::
Register
::
Not logged in
Home
Tags
Articles
Editorials
Stairways
Forums
Scripts
Videos
Blogs
QotD
Books
Ask SSC
SQL Jobs
Training
Authors
About us
Contact us
Newsletters
Write for us
Recent Posts
Recent Posts
Popular Topics
Popular Topics
Home
Search
Members
Calendar
Who's On
Home
»
SQLServerCentral.com
»
Editorials
»
Clustered Indexes? Sedimentary, my dear...
50 posts, Page 1 of 5
1
2
3
4
5
»
»»
Clustered Indexes? Sedimentary, my dear Watson
Rate Topic
Display Mode
Topic Options
Author
Message
Tony Davis
Tony Davis
Posted Saturday, May 22, 2010 11:08 AM
Mr or Mrs. 500
Group: Administrators
Last Login: Wednesday, May 08, 2013 10:31 AM
Points: 511,
Visits: 945
Comments posted to this topic are about the item
Clustered Indexes? Sedimentary, my dear Watson
Post #926440
Jeff Moden
Jeff Moden
Posted Saturday, May 22, 2010 11:29 AM
SSC-Dedicated
Group: General Forum Members
Last Login: Today @ 1:24 PM
Points: 32,913,
Visits: 26,806
Heh... It certainly "depends" but try rebuilding indexes without a clustered index and see what happens. Keep in mind that indexes also "split" except they frequently do so by extent rather than by page.
--Jeff Moden
"
RBAR
is pronounced "ree-bar" and is a "Modenism" for "
R
ow-
B
y-
A
gonizing-
R
ow".
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Post #926446
dhamilton-905368
dhamilton-905368
Posted Saturday, May 22, 2010 11:55 PM
SSC-Enthusiastic
Group: General Forum Members
Last Login: Friday, October 05, 2012 1:41 AM
Points: 155,
Visits: 146
Tony,
Awesome point that I think too few people consider (or even know to consider). I
always
think about this when creating a new table, but I don't
always
know exactly how that table will be used in the "wild". I've even re-clustered tables upon seeing how they're typically queried [sp?].
Even when I recluster, I recognize that there will be a price to pay with inserts. I personally haven't dug deeply enough to know when that price is worse than the SELECT improvements. I generally just consider the situation -- more INSERTs or more SELECTS?
Unfortunately, I work with a crew that feels that every table should have a clustered IDENTITY index that "resolves any abiguity" -- even though 90% of the time, the tail end of a relation will NEVER query that relational ID!
In general, our DB's are small enough that that is an OK approach. HOWEVER, we have one MASSIVE DB that has benefitted from
some
reclustering, and could benefit more, but there is reistance due to the above beliefs. We'll see how that sorts out over time, as we get to the 5-6 million-plus records (which is fast approaching).
-----------
Wish I had a clever tagline here... D.
Jsut thinking -- on the tagline -- what do you all think of "Inspiriation / Perspiration -- smell is in the nose of the beholder" or something like that? Thanks! D.
Post #926488
GilaMonster
GilaMonster
Posted Sunday, May 23, 2010 2:38 AM
SSC-Dedicated
Group: General Forum Members
Last Login: Today @ 2:33 PM
Points: 37,741,
Visits: 30,019
Well I'm strongly in the 'clustered index to organise the table' camp, and do have reasons.
Firstly, note that these reasons are for OLTP-type systems. Lots of changes, lots of fast, small queries. The indexing rules for decision support/OLAP can be very different because of the radically different usage pattern.
There can only be one clustered index, and it's going to be the largest index that exists for that table. Hence if it needs rebuilding, it will take the longest and have the greatest log impact so I want to minimise the need to rebuild
Because there can only be one clustered index, I personally prefer to use the cluster mostly to organise the table (hence the ever-increasing and non-changing attributes) and use the nonclustered indexes for queries.
If I can get the cluster so it's quite narrow, unique (or nearly so), ever increasing, non-changing and have it useful for queries as well, that's absolute first prize. My favourite here is cluster on a date inserted column where the table is often queried by date. Many transaction-type tables fit in this category.
If the system has lots and lots of inserts (OLTP), I need those inserts as fast as possible. Hence I do not want frequent page splits on the cluster, nor do I want data moving around in the table if the clustering key gets updated. Both of those, if occurring enough, can have nasty effects on insert performance.
Heaps have some interesting downsides. Until 2008, they couldn't be rebuilt, and there's occasionally odd behaviour around reusing of space within a heap. Plus those forwarding pointers.
Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild
: Discussions on DB performance with occasional diversions into recoverability
We walk in the dark places no others will enter
We stand on the bridge and no one may pass
Post #926492
pk_nirmal
pk_nirmal
Posted Sunday, May 23, 2010 2:41 AM
SSC Rookie
Group: General Forum Members
Last Login: Friday, February 08, 2013 10:42 PM
Points: 27,
Visits: 45
Hi,
I think we should not generalize; it is purely on case to case basis, if entity poses ordered attribute and that attribute make sense to the underline domain then we don’t need Identity Column for Cluster Index (assume unique key length is not too large). But at the same time if entity poses time based attribute, then better choice is Identity column (Sedimentary) for Cluster Index.
E.g. – In Banking domain Savings Account Number attribute in Saving entity, here most of the cases data need to be fetch in Sing Account No order. So Saving Account No can use for Cluster index. Here again new account number not creating in sequence, but need to apply some prefix/suffix based on Saving Type or other attribute then we can again go back to Identity column fro Cluster Index.
At the same time Saving Transaction case data normally need to arrange based on time of transaction we can have Identity Column and same can use for Cluster Index.
thanks
Nirmal
Post #926493
Sam Jumper
Sam Jumper
Posted Sunday, May 23, 2010 2:45 PM
Forum Newbie
Group: General Forum Members
Last Login: Wednesday, May 01, 2013 12:49 PM
Points: 4,
Visits: 53
Timely post for me. Two days ago, I was troubleshooting a deadlock issue and found that deadlocks were centered around updates to a table which had a 3 column clustered index and all 3 columns were being updated.
In a case where the clustered key is likely to change, I think it's a bad idea to cluster.
That led me to the conclusion that a particular unique composite key is ok to cluster if it rarely changes. In the absence of a suitable candidate, adding an Identity column or a uniqueidentifier with a default of newsequentialid() is my next choice for clustered index.
- Sam
Post #926560
stephen jones-415753
stephen jones-415753
Posted Sunday, May 23, 2010 7:07 PM
Forum Newbie
Group: General Forum Members
Last Login: Tuesday, December 04, 2012 2:00 AM
Points: 1,
Visits: 33
I come from a data analyst background but this has got me into dba and database design work. My area of work is around demographic statistics so I've built up a database to store large census datasets and other administrative datasets. Each dataset usually has its own table however many tables, particularly the population table have multiple datasets.
Anyway, awhile ago I realised that my clustered identity key did and didn't make logical sense i.e. as per your article. It didn't sit well with me because to me the data should be grouped according to the dataset and then most by the various demographic fields i.e. age, gender etc...
But the reality is that I only upload datasets one at a time and always in a sort ordered manner. Therefore by coincidence my data was ordered logically anyway, however not by design but more by good fortune.
I've found that my queries are always by dataset so the first thing the query engine would do is pull out the range of ids that fit the dataset (I have non-clustered indexes to which I presume the range gets easily identified). After that it would do table joins by ID and then or possibly before the column filters get applied.
I realised then that the clustered identity column works for me in this instance however if my database wasn't so OLAP like it could be less than ideal. I love non-composite primary keys, they are just so easy for writing queries against but I couldn't put it over server performance.
Thanks for the article it is good to read about the pros and cons especially with regards to page splitting, and insert, updates and deletes.
Post #926583
Ken Lee-263418
Ken Lee-263418
Posted Sunday, May 23, 2010 11:45 PM
SSC Rookie
Group: General Forum Members
Last Login: Friday, December 21, 2012 1:46 AM
Points: 28,
Visits: 59
What? NOBODY mentions the major reason why identity fields were frowned on? Has this reason disappeared? Each insert is going to the same page, at least until the page is filled and even in 2005 that will cause a page lock until the insert finishes. (No more inserts allowed until the transaction commits.) It caused the term "Hot Spot" to be invented. Also the problem with the old datetime format when trying to insert by date when you have 20 records waiting to insert in the same 3 millisecond gap. (When your key was just a datetime field)
Since when does sequential mean organizing data together? We've had two uniqueindex, clustered, primary key fields in one table. The first identifies the client (the value found in another table) The second is definitely unique in the table because it is always generated with a newid() value when inserted. This definitely organizes the data together because everything is retrieved for the client when a client shows activity. Now that's organizing data together using a completely random identification process.
I feel for the person with the deadlocking problem. That's an argument among a host of reasons why artificial keys like identity fields are better than natural keys. (Keys, whose values have intrinsic meanings.)
Post #926619
Manie Verster
Manie Verster
Posted Monday, May 24, 2010 12:29 AM
Ten Centuries
Group: General Forum Members
Last Login: Monday, May 20, 2013 3:00 AM
Points: 1,151,
Visits: 879
I have a small 12 GB database to look after plus I am not a DBA since I am actually one of the developers that developed this database and I know by now how popular developers are amongst DBA's but I base my indexing on the following.
1. A primary key is the clustered index
2. All other indexes are nonclustered since you can anyway only have one clustered index.
The reason why is, as someone has said already, that with clustered indexes the data gets ordered according to the columns and therefore the data gets added that way in the index. Less scanning.
I use identity columns only when absolutely necessary.
Manie Verster
Developer
Johannesburg
South Africa
Life is about choices.... I choose to be happy today
Post #926629
Richard Williams-416852
Richard Williams-416852
Posted Monday, May 24, 2010 1:02 AM
Grasshopper
Group: General Forum Members
Last Login: Friday, April 26, 2013 4:04 AM
Points: 24,
Visits: 48
We manage a system where there are thousands of reads for every write operation so we use clustered indexes on the keys used to do the majority of joins (not always the primary key). Performance is woeful if we rely on normal indexes for these key join relationships. We also have the added issue that the majority of key values in the system carry additional meaning i.e. the range of numbers (INT) also implies business grouping ownership which means clustering also has a side effect of storing all the groups in an order - rather than being a burden, this happenstance appears to offer many benefits
Post #926642
« Prev Topic
|
Next Topic »
50 posts, Page 1 of 5
1
2
3
4
5
»
»»
Permissions
You
cannot
post new topics.
You
cannot
post topic replies.
You
cannot
post new polls.
You
cannot
post replies to polls.
You
cannot
edit your own topics.
You
cannot
delete your own topics.
You
cannot
edit other topics.
You
cannot
delete other topics.
You
cannot
edit your own posts.
You
cannot
edit other posts.
You
cannot
delete your own posts.
You
cannot
delete other posts.
You
cannot
post events.
You
cannot
edit your own events.
You
cannot
edit other events.
You
cannot
delete your own events.
You
cannot
delete other events.
You
cannot
send private messages.
You
cannot
send emails.
You
may
read topics.
You
cannot
rate topics.
You
cannot
vote within polls.
You
cannot
upload attachments.
You
may
download attachments.
You
cannot
post HTML code.
You
cannot
edit HTML code.
You
cannot
post IFCode.
You
cannot
post JavaScript.
You
cannot
post EmotIcons.
You
cannot
post or upload images.
Copyright © 2002-2013 Simple Talk Publishing. All Rights Reserved.
Privacy Policy.
Terms of Use.
Report Abuse.