• Snargables (8/22/2014)


    Recently I got into a discussion w/ a coworker over the primary key of a fact table. He wanted to put the primary key on the identity and I suggested putting the primary key on the unique columns which in this instance there are two and they are ints.

    Here’s his logic.

    Cluster index the surrogate key(PK) and then add non-clustered indices to the table. A smaller primary key leads to smaller non-clustered indices and faster performance.

    To me this doesn’t seem right. Why would u cluster an id unless u were going to use it in your queries?

    Expanding on all the good points Sean has made, consider this: what if you needed six or even ten columns of your table in order to uniquely identify a row? And to make it even more awkward, one or more of those columns are nullable 😛

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden