RE: cluster index or non cluster index

SSC Eights!

Points: 826

May 26, 2010 at 11:23 am

I'm pleased to find it stated repeatedly here that the index design should always be specific to the situation, and tested for correctness.

I always object to the commonly held mantras that;

"In general every table should have a clustered index", "in general, the clustered index should be on the primary key" and "in general, clustered indexes perform better".

None of these things seem true to me. Or at least, the performance differences being discussed only apply in specific applications and not at all as a general rule.

In my experience, in the real world,

1. Most data (by volume) is entered and referenced sequentially - it's naturally sorted in the way it is entered in a heap.

2. Most data is not updated very much. The whole forwarding pointers thing is not a significant issue, especially if you're routinely maintaining your indexes.

3. By design, primary keys are less and less likely to be useful in semantic queries - there's a general design movement towards abstracting physical keys from the data so that we're using integers or GUID's as a physical key where a larger, logical semantic key could be built from the data also but would be i/o expensive to use. Primary keys are also essential in joins, or should be. A primary key should ideally be as small as possible (actually that may not always be true either, depends on the application).

4. Clustered indexes are most appropriate where the data is routinely manipulated in sorted batches rather than working with single rows. This is most commonly true of date ranged data, and most commonly in warehousing / olap / reporting applications.

5. I've just been going through the SQL 2008 70-432 exam course material. It states that

"A clustered index does not physically store the data on disk in a sorted order because doing so creates a large amount of disk input/

output (I/O) for page split operations. Instead, a clustered index ensures that the page chain of the index is sorted logically, allowing SQL Server to traverse directly down the page chain to locate data"

... I'm afraid I don't understand what sorting a page chain "logically" means.

Anyone?