RE: Introduction to Indexes: Part 2 – The clustered index

SSC Veteran

Points: 233

November 11, 2009 at 5:43 am

Hi,

I find it difficult to believe this article, especially the reference that all non clustered indexes use the clustered index as their lowest level address (hence a large clustered index key increases the size of all other indexes)!

If this is true, how is underlying data addressed (by an index) when no clustered index has been specified? Different rules for different scenarios (I doubt it)? What if a clustered index is Dropped? Are the 'different' rules re-applied retrospectively?

Some time ago I had some involevement back in DBMS design and we used a Table Row Address (TRA) which was effectively the Page number of where a row was stored etc (6 bytes). Pages were added to a table (on demand) in physically adjacent 'blocks' and were numbered sequentially from 1.

Each index at leaf level would consist of the Key value and its TRA (for the parent row) hence each 'Index entry' consisted of rows that were the length of the Key + 6 bytes in length. (Index rows in the tree followed the same construction with the highest key in the page at the lower level and its resident page number as a TRA).

True, the leaf level would be the data pages for a clustered index however, the clustered data was supported by a standard B tree index in the same way a non clustered index is implemented.

Where several rows are required and the query references the clustered key, the index tree would be used to locate the start point (seek) and once determined, successive reads would be at the data page (leaf) level whereas the article suggests that a full table read would be necessary, even to locate the first row in the series.

Incidentally, "never choose a clustered key that can change" - agree however on many DBMS systems this is a standard constraint and is a good rule to follow for the sake of transportability.

Choose a key that minimises page splitting is always a good rule to adopt however there are occassions where this can be an overall benefit.

If interactive data accessing and updating is more likely to occur on the most recently added rows and the insertion rate is relatively high with many system users, then selecting a cluster key that 'physically scatters' data throughout the table at the data level can help to reduce the liklihood of deadlocks.

Where this can't be avoided, it's also useful to dump and reload the table on a regular basis to optimise the structure.

Jim

Trainee Novice:w00t: