RE: Introduction to Indexes: Part 2 – The clustered index

SSC Guru

Points: 1004504

November 11, 2009 at 6:15 am

Esalter (11/11/2009)
I find it difficult to believe this article, especially the reference that all non clustered indexes use the clustered index as their lowest level address

Why do you find that hard to believe? The nonclustered indexes must have some way to refer to the actual data row, much as an index in a book as the page number, to reference the actual entry.

If you want an authoritative source to confirm what I've said, see the MSDN article on Nonclustered index architecture: http://msdn.microsoft.com/en-us/library/ms177484.aspx

If this is true, how is underlying data addressed (by an index) when no clustered index has been specified? Different rules for different scenarios (I doubt it)?

The rules are indeed different, though only slightly. If you refer to part 1[/url] of the indexing series (or the aforementioned MSDN article)

Nonclustered indexes are separate from the table. The leaf level of a nonclustered index has a pointer as part of each index row. That pointer is either the clustered index key in the cases where the base table has a clustered index or the RID (Row Identifier) in the cases where the table is a heap. The RID is an 8-byte structure comprised of File ID, Page Number and Slot Index and will uniquely identify a row in the underlying heap. Either way, the each row of a nonclustered index has a reference to the complete data row.

What if a clustered index is Dropped? Are the 'different' rules re-applied retrospectively?

Yes. If the clustered index is dropped all nonclustered indexes on the table are rebuilt with the RID as the row locator. If a clustered index is recreated, all the nonclustered indexes are again rebuilt, replacing the RID with the new clustering key.

That's why it's often suggested to build the clustered index first and then the nonclustered indexes. Doing it the other way around is inefficient as the nonclustered indexes would be built and then later rebuilt.

True, the leaf level would be the data pages for a clustered index however, the clustered data was supported by a standard B tree index in the same way a non clustered index is implemented.

SQL's clustered index is a b-tree as well, just like for a nonclustered index. The main difference is that the clustered index has the entire data row at the leaf level.

whereas the article suggests that a full table read would be necessary, even to locate the first row in the series.

Where is that stated/implied? If you can read that into something I said, I'll edit the article to make it clearer.

If SQL needs a range of rows from a clustered index it'll seek to the beginning of the range (or sometimes the end of the range) and then read along the leaf pages until it finds all the rows that it needs.

If interactive data accessing and updating is more likely to occur on the most recently added rows and the insertion rate is relatively high with many system users, then selecting a cluster key that 'physically scatters' data throughout the table at the data level can help to reduce the liklihood of deadlocks.

It can reduce contention, I would argue that if there are deadlocks there's something else wrong. The 'hotspot' reason for not choosing an ascending key kinda went away in SQL 7 with the change to the locking mechanisms. I recall someone doing a test a while back, can't recall who or specific numbers, but thousands of inserts/sec were required to cause enough contention to slow the inserts down. YMMV.

Where this can't be avoided, it's also useful to dump and reload the table on a regular basis to optimise the structure.

Dump and reload is not necessary. An index rebuild does the job.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass