Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

Introduction to Indexes: Part 2 – The clustered index Expand / Collapse
Author
Message
Posted Tuesday, November 10, 2009 11:46 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 6:44 AM
Points: 39,898, Visits: 36,243
Comments posted to this topic are about the item Introduction to Indexes: Part 2 – The clustered index


Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #816989
Posted Wednesday, November 11, 2009 5:43 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, November 23, 2009 1:40 AM
Points: 13, Visits: 18
Hi,

I find it difficult to believe this article, especially the reference that all non clustered indexes use the clustered index as their lowest level address (hence a large clustered index key increases the size of all other indexes)!

If this is true, how is underlying data addressed (by an index) when no clustered index has been specified? Different rules for different scenarios (I doubt it)? What if a clustered index is Dropped? Are the 'different' rules re-applied retrospectively?

Some time ago I had some involevement back in DBMS design and we used a Table Row Address (TRA) which was effectively the Page number of where a row was stored etc (6 bytes). Pages were added to a table (on demand) in physically adjacent 'blocks' and were numbered sequentially from 1.

Each index at leaf level would consist of the Key value and its TRA (for the parent row) hence each 'Index entry' consisted of rows that were the length of the Key + 6 bytes in length. (Index rows in the tree followed the same construction with the highest key in the page at the lower level and its resident page number as a TRA).

True, the leaf level would be the data pages for a clustered index however, the clustered data was supported by a standard B tree index in the same way a non clustered index is implemented.

Where several rows are required and the query references the clustered key, the index tree would be used to locate the start point (seek) and once determined, successive reads would be at the data page (leaf) level whereas the article suggests that a full table read would be necessary, even to locate the first row in the series.

Incidentally, "never choose a clustered key that can change" - agree however on many DBMS systems this is a standard constraint and is a good rule to follow for the sake of transportability.

Choose a key that minimises page splitting is always a good rule to adopt however there are occassions where this can be an overall benefit.
If interactive data accessing and updating is more likely to occur on the most recently added rows and the insertion rate is relatively high with many system users, then selecting a cluster key that 'physically scatters' data throughout the table at the data level can help to reduce the liklihood of deadlocks.
Where this can't be avoided, it's also useful to dump and reload the table on a regular basis to optimise the structure.

Jim


Trainee Novice
Post #817093
Posted Wednesday, November 11, 2009 6:15 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 6:44 AM
Points: 39,898, Visits: 36,243
Esalter (11/11/2009)
I find it difficult to believe this article, especially the reference that all non clustered indexes use the clustered index as their lowest level address

Why do you find that hard to believe? The nonclustered indexes must have some way to refer to the actual data row, much as an index in a book as the page number, to reference the actual entry.

If you want an authoritative source to confirm what I've said, see the MSDN article on Nonclustered index architecture: http://msdn.microsoft.com/en-us/library/ms177484.aspx

If this is true, how is underlying data addressed (by an index) when no clustered index has been specified? Different rules for different scenarios (I doubt it)?

The rules are indeed different, though only slightly. If you refer to part 1 of the indexing series (or the aforementioned MSDN article)
Nonclustered indexes are separate from the table. The leaf level of a nonclustered index has a pointer as part of each index row. That pointer is either the clustered index key in the cases where the base table has a clustered index or the RID (Row Identifier) in the cases where the table is a heap. The RID is an 8-byte structure comprised of File ID, Page Number and Slot Index and will uniquely identify a row in the underlying heap. Either way, the each row of a nonclustered index has a reference to the complete data row.


What if a clustered index is Dropped? Are the 'different' rules re-applied retrospectively?

Yes. If the clustered index is dropped all nonclustered indexes on the table are rebuilt with the RID as the row locator. If a clustered index is recreated, all the nonclustered indexes are again rebuilt, replacing the RID with the new clustering key.
That's why it's often suggested to build the clustered index first and then the nonclustered indexes. Doing it the other way around is inefficient as the nonclustered indexes would be built and then later rebuilt.

True, the leaf level would be the data pages for a clustered index however, the clustered data was supported by a standard B tree index in the same way a non clustered index is implemented.

SQL's clustered index is a b-tree as well, just like for a nonclustered index. The main difference is that the clustered index has the entire data row at the leaf level.

whereas the article suggests that a full table read would be necessary, even to locate the first row in the series.

Where is that stated/implied? If you can read that into something I said, I'll edit the article to make it clearer.

If SQL needs a range of rows from a clustered index it'll seek to the beginning of the range (or sometimes the end of the range) and then read along the leaf pages until it finds all the rows that it needs.

If interactive data accessing and updating is more likely to occur on the most recently added rows and the insertion rate is relatively high with many system users, then selecting a cluster key that 'physically scatters' data throughout the table at the data level can help to reduce the liklihood of deadlocks.

It can reduce contention, I would argue that if there are deadlocks there's something else wrong. The 'hotspot' reason for not choosing an ascending key kinda went away in SQL 7 with the change to the locking mechanisms. I recall someone doing a test a while back, can't recall who or specific numbers, but thousands of inserts/sec were required to cause enough contention to slow the inserts down. YMMV.

Where this can't be avoided, it's also useful to dump and reload the table on a regular basis to optimise the structure.

Dump and reload is not necessary. An index rebuild does the job.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #817104
Posted Wednesday, November 11, 2009 6:29 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, September 16, 2014 2:03 PM
Points: 1,334, Visits: 3,069
Very good series on the basics of indexes. Unfortunately, I was not able to read Part 3 as of yet since there seems to be an error on the web page. One comment on Part 2 however regarding the three ways a Clustered Index is used. In particular, the Bookmark Lookup. As Gail states in her article it is a very expensive operation and should be avoided if at all possible, With the advent of SQL 2005 and 2008 there is a very easy way to avoid the classic bookmark lookup and still maintain the narrow selectivity of your original non-clustered index. For example, lets take Gail's example and put a little twist on it and then run the execution plan once again and see the difference. First lets drop her current non-clustered index on ProductNumber:

DROP INDEX AK_PRODUCT_PRODUCTNUMBER
ON PRODUCTION.PRODUCT
GO

Ok, piece of cake. Now, let's recreate it again, this time including the NAME column in her query thus making this a truly COVERED INDEX, Notice that we have not changed the original narrow selectivity of this index.

CREATE UNIQUE NONCLUSTERED INDEX AK_PRODUCT_PRODUCTNUMBER
ON PRODUCTION.PRODUCT (PRODUCTNUMBER)
INCLUDE (NAME)
GO

SELECT ProductID, ProductNumber, Name
FROM production.Product
WHERE ProductNumber = 'HN-1224'


Now, run Gail's query and execution plan again and notice the Bookmark Lookup is gone! Replaced with an Index Seek operation. Now notice how fast that row is returned? Use the INCLUDE columns in your NC indexes whenever you can to fix those expensive Bookmark Lookups using the Clustered Index. The Query Optimizer will thank you for it, trust me :) HTH Travis.


"Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ..."
Post #817111
Posted Wednesday, November 11, 2009 6:48 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 6:44 AM
Points: 39,898, Visits: 36,243
talltop-969015 (11/11/2009)
Unfortunately, I was not able to read Part 3 as of yet since there seems to be an error on the web page.


That's because it hasn't been published yet.

Use the INCLUDE columns in your NC indexes whenever you can to fix those expensive Bookmark Lookups using the Clustered Index.


Maybe. Just because a query can be covered doesn't necessarily mean it should be covered. As with everything, there are downsides. Covered in part 3 of the series.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #817120
Posted Wednesday, November 11, 2009 6:53 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, September 16, 2014 2:03 PM
Points: 1,334, Visits: 3,069
Hmmm, the link is there at the bottom of Part 2 though. Anyway, sorry about that.. Also, I never meant to say that every column should be covered in every query. I was merely pointing out the vast difference it makes in performance and the execution plan in your particular query.. :) I agree that every query needs to be analyzed first on a case by case basis,,,

"Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ..."
Post #817129
Posted Wednesday, November 11, 2009 7:01 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 6:44 AM
Points: 39,898, Visits: 36,243
talltop-969015 (11/11/2009)
Hmmm, the link is there at the bottom of Part 2 though.


Yes it is. Steve added the links, probably so that he wouldn't have to repeatedly edit the articles as the subsequent ones were published.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #817135
Posted Wednesday, November 11, 2009 7:35 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 11:10 AM
Points: 1,326, Visits: 19,276
Excellent article Gail, as always, thank you for that!

---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #817159
Posted Wednesday, November 11, 2009 8:01 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 2:44 PM
Points: 4,359, Visits: 6,197
Jim, first off, welcome to the wonderful world of SQL Server! Also, what is your primary RDBMS - i.e. from whence do you come to us?

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #817176
Posted Wednesday, November 11, 2009 8:02 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 2:44 PM
Points: 4,359, Visits: 6,197
I am often-times disappointed that I cannot have more than one clustered index. Do you think they will ever put this most useful feature in SQL Server Gail??

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #817179
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse