• Eugene Elutin (10/4/2012)


    Yes, you are right! There is no one-size-fits-all answer to that.

    However...

    I guess it was some long discussions between SQL Server (and back to old days Sybase) design engineers/architects about making PK clustered by default. They decided to do so, and not only for the reason "not to create heaps". PK is not always the best choice for clustered, but quite often it is. Use of Surrogate keys for PK is quite popular in modern RDBMS's (at least in SQL Server and Oracle), so may be that is another statistical manifestation example of 80-20 rule...

    I think that in more than 20% of cases a large enough proportion of the work required of the dbms involves searches for a range of values in some initial columns of the natural key. A surrogate key may of course still be useful for reducing the storage required in other tables for references to this table (and consequently reducing the number of in those other tables pages handled by joins with this table), but while it does no harm to make the surrogate the primary key it is not always sensible in these cases to make it the cluster key - the natural key should have a unique constraint on it and non-null constraints on each of its columns (ie the table should be normalised at least to 1NF, to keep the code manipulating tha table simple - actually I generally can't see any point in normalisation to lower levels than EKNF, but only 1NF is relevant to this discussion) and often either that index (if it's narrow enough) or an index on the relevant intial part (perhaps just one column) of the natural key should be used for clustering.

    Every case needs careful analysis, and I don't believe there's an 80 to 20 rule in favour of clustering on a surrogate.

    Tom