• I come from a data analyst background but this has got me into dba and database design work. My area of work is around demographic statistics so I've built up a database to store large census datasets and other administrative datasets. Each dataset usually has its own table however many tables, particularly the population table have multiple datasets.

    Anyway, awhile ago I realised that my clustered identity key did and didn't make logical sense i.e. as per your article. It didn't sit well with me because to me the data should be grouped according to the dataset and then most by the various demographic fields i.e. age, gender etc...

    But the reality is that I only upload datasets one at a time and always in a sort ordered manner. Therefore by coincidence my data was ordered logically anyway, however not by design but more by good fortune.

    I've found that my queries are always by dataset so the first thing the query engine would do is pull out the range of ids that fit the dataset (I have non-clustered indexes to which I presume the range gets easily identified). After that it would do table joins by ID and then or possibly before the column filters get applied.

    I realised then that the clustered identity column works for me in this instance however if my database wasn't so OLAP like it could be less than ideal. I love non-composite primary keys, they are just so easy for writing queries against but I couldn't put it over server performance.

    Thanks for the article it is good to read about the pros and cons especially with regards to page splitting, and insert, updates and deletes.