Multi-column primary keys

  • I have a table of with tens of thousands of entries (SSN primary keys), that gets refreshed biweekly. To keep a history archive, I add those records to a history table, adding a pay-period (yypp) identifier to each. (The table currently contains records preserved over about 75 pay-periods.)

    Initially, I defined the the pay-period as the primary key (clustered), and had a second index on ssn. That seemed to work, and the performance, while not great, was not horrible.

    I recently needed to redefine that table, and in the process I decided that I should define, as a primary key, both the pay-period and SSN columns.

    But in that configuration, when only the pay-period is in the where clause, the performance is bad. I had to also add a pay-period index by itself to restore the old performance.

    So I guess I simply don't understand the way the SQL engine makes use of a compound primary key. When queries are expected which reference either of the key columns independently, is it recommended to also add individual indexes for each of the compound primary key columns. If so, what is gained by the compound primary key in the first place?

  • The primary key is the unique identifier in the table. That's it's primary purpose, a constraint on data so that each row is uniquely identified. However, it can also help performance.

    When you mention that there are two columns and that you need a seperate index for each, that's not true. The leading edge of your index can always be used by a query if it needs that index. The only time you need a second index is when queries hit the second column in the index independent of the first.

    However, to help you most directly, can you post the execution plans of the queries that are giving you trouble? Also post the query itself and some sample data and a structure with expected output. Then you'll get very explicit answers.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thank you Grant!

    Since I can't attach .sql or .sqlplan files here, I have attached a .doc file which attempts to capture my situation. And in the process of doing that

    (after first deleting the index on PAYPERIOD which I had since created to recreate the situation I first noticed) and looking at the estimated execution plans (which I should have done before I posted my original question) I thought I could see what might be going on.

    The new (slow) table is T102ALL; the original is T102ALLOLD. My source table (T102) used SSNO as a primary key; since I expected duplicate SSNOs in my original history table (T102ALLOLD) I had not defined a primary key, but instead build a clustered index on PAYPERIOD, and added a non-unique index on SSNO (and a couple of other fields.)

    My theory is that by adding the PAYPERIOD+SSNO primary key index in the new version of the table (T102ALL) I now have a primary key index of some 5 million PAYPERIOD-SSNO combinations, as opposed to the 75 or so distinct PAYPERIODs in the clustered index.

    But looking more closely at the execution plans, I see that one of the "couple of other field" indexes is being called into play. I have no idea why, but I'll add those same indexes to the new table and see if anything changes. Perhaps that is an index with the fewest distinct values, so it is used to sum the number of rows in the entire table.

    I first saw the slowdown on queries which were grouping by PAYPERIOD, but then noticed that queries using only select count(*) were exhibiting the same performance differences.

  • If you want help, you should post your sql in the message in between code tags.

    Most people (including me) will not open a file posted on the forum because it might contain unwelcome code.

  • I appreciate your concern Michael.

    I used .doc so I could include screen captures, showing the queries, parts of the Object Explorer (to show keys and indexes), and the output of the estimated execution plan(s). Do you think we could convince Steve Jones to allow .sqlplan attachments?

  • I just zip them and attach them.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jim Russell (12/22/2008)


    Thank you Grant!

    Since I can't attach .sql or .sqlplan files here, I have attached a .doc file which attempts to capture my situation. And in the process of doing that

    (after first deleting the index on PAYPERIOD which I had since created to recreate the situation I first noticed) and looking at the estimated execution plans (which I should have done before I posted my original question) I thought I could see what might be going on.

    The new (slow) table is T102ALL; the original is T102ALLOLD. My source table (T102) used SSNO as a primary key; since I expected duplicate SSNOs in my original history table (T102ALLOLD) I had not defined a primary key, but instead build a clustered index on PAYPERIOD, and added a non-unique index on SSNO (and a couple of other fields.)

    My theory is that by adding the PAYPERIOD+SSNO primary key index in the new version of the table (T102ALL) I now have a primary key index of some 5 million PAYPERIOD-SSNO combinations, as opposed to the 75 or so distinct PAYPERIODs in the clustered index.

    But looking more closely at the execution plans, I see that one of the "couple of other field" indexes is being called into play. I have no idea why, but I'll add those same indexes to the new table and see if anything changes. Perhaps that is an index with the fewest distinct values, so it is used to sum the number of rows in the entire table.

    I first saw the slowdown on queries which were grouping by PAYPERIOD, but then noticed that queries using only select count(*) were exhibiting the same performance differences.

    It's the query you're using. SELECT COUNT(*) will use the index with the smallest number of pages to satisify the query. So it's going after that index that will show what it needs for the least cost. When you're simply processing 5.5 million rows all at once, you're lucky any index comes into play at all.

    The use of indexes is all about selectivity. When you select everything, the concept of selectivity is out the window.

    Gail Shaw had some interesting sample code around COUNT methods in her SQL Pass Presentation. It might be on her blog.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Now it makes sense Grant, thank you very much!

    A good example of how one (ok, me) can be led astray -- because I first noticed the problem with a GROUP BY PAYPERIOD clause, and since I had just changed my PAYPERIOD index, I jumped to a bad conclusion and didn't step back and think about what I was seeing. And I posted the question without doing enough investigation on my own -- my bad!

    And I will surely look for Gail's paper -- I learned long ago that it Gail says it, it must be so!

    (In fact, one of Gail's postings helped me this weekend -- in that same multi-million row table, I needed to change an int to char. Looking at the script that SSMS would have generated, it was about to clone the entire file to accomplish it. A search here led me to Gail's reply to a similar question and the realization all I needed was an ALTER COLUMN; it is too easy to forget the basics when you rely too much on the GUI interface. Thanks Gail, if you read this!)

    (Re the .zip attachment -- I expect Michael would be just as reluctant to open a zipped .doc file: it pays to be paranoid.)

  • Glad I was a bit helfpul.

    Yeah, I learn something new from Gail almost daily. On the rare occasions she doesn't teach me something new, she asks a question or makes a suggestion that leads me down a new path. I suspect that's why she's an MVP.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Not directly related to the mechanics on how to do it, but:

    By SSN, do you mean Social Security Number ? As a primary key in a database ?

    SSN is a sensitive piece of information and unintended disclosure can lead to identity theft and major hassles for the victims. So I hope this is not the case otherwise the question pops up: what were they thinking ?

  • Grant Fritchey (12/22/2008)


    Gail Shaw had some interesting sample code around COUNT methods in her SQL Pass Presentation. It might be on her blog.

    Or it might not. At least right at this moment, that is.

    (In fact, one of Gail's postings helped me this weekend ... Thanks Gail, if you read this!)

    Any time. And thank you for researching problems yourself before posting.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • To jump on the Non SSN PK bandwagon....

    While I don't have the specific reference, (I think the Privacy Act of 1974 had some provisions for it) but it's my understanding that in certain parts of the US, it's illegal to use the SSN as a primary key. Not illegal to store and index it, just illegal to use as a primary identifier...

    For what it's worth...

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • This is interesting. Though I am a bit surprised a law would cover such a specific technical detail (primary key).

    The OP has not confirmed if this is indeed the case, but if it is indeed covered by law in "some states" eventually, the system might spill over in one of such states. (Mergers and Acquisitions, anyone? who knows?).

    As far as i am converned, since there are several ways of generating unique primary keys, there is no valid reason to use SSN's for this purpose. Its sole use should be for communicating with the government (income tax statemnts, etc.). And it should reside in ONE table, not scattered throughout the database as a foreign key.

    Copnsidering all the recent thefts of commercial and credit records, the SSN should be in ONE column in ONE table and if anything should be stored encrypted, this would be it.

    The major hassles to victims of identity-theft far outweight the small convenience of having a ready-made primary key instead of using GUID's or identity columns. A very poor design.

  • Copnsidering all the recent thefts of commercial and credit records, the SSN should be in ONE column in ONE table and if anything should be stored encrypted, this would be it.

    Exactly. If you aren't passing my info to a tax bureau or something, you don't need to SSN period. That's not even getting into the whole bit of re-used SSN's (yes it has happened int he past) and various other oddities with SSNs in general. I believe that was the gist of the laws. It's not a guaranteed unique ID, plus privacy concerns plus the Social Security Commission (government agency and all the ills that comes with it) equals a bad identifier.

    -Luke.

    To help us help you read this[/url]For better help with performance problems please read this[/url]

  • Yes, yes, and I don't know what they were thinking either. The source is a very old mainframe system, and the use of the SSN as the identifying key is the least of my problems. It is a real goat screw!

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply