• N_Muller (2/26/2015)


    I have tables in database where a VARCHAR(50) string is unique identifier. The database currently has an integer identity column as clustered primary key, and there's a non-clustered index on string column. The customer always queries based on a defined set of the identifier (string) column.

    I wonder if someone sees an advantage of adding a persisted computed column to the table as the checksum of the string column, and then create a non-clustered index on the checksum and the string. When a customer requests data, we would compute the checksum of the customer provided identifier and add to the where clause or join, that the checksum and string must match.

    Will SQL Server perform checksum check (integer) and only if it succeeds, perform the string check, in which case I see an advantage of added the checksum column? Or will SQL Server always check for both the checksum and string, in which case the additional column only adds unnecessary overhead? To note is the fact that the table(s) will have millions of rows, but the customer will request data for at at most, 100 or so identifiers.

    Let's back the performance cart up a bit here. You say the data in the string column is a "unique identifier". What precisely does that mean? Is it a GUID? If so, then it should absolutely NOT be the clustered index. Is the string ever increasing in value? If not, this it should probably not be the clustered index here, either.

    What kind of data does this string have in it?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)