Add unique valules to a heap

  • I have a large table with duplicate values in all columns. I need to add a column, populate it with unique values (INT with values 1,2,3,etc is fine) so I can create a unique index.

    What's the best way to go about this? This is SQL 2008.

  • Why do you want to create a column with unique values to create a unique index on it? What do you expect to achieve?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • We have a new process to run updates on it that takes forever, so it needs a primary key or at least a unique index. One column already has a non-unique index if that's any help.

  • There is still something missing here but why not just add an identity column? Then make it the primary key since it doesn't seem to matter what the clustered index might be.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This smells like RBAR. That would certainly be slow.

    If you post DDL for your table and indexes, along with the UPDATE process we could give better advice on what to change to improve performance.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • dan-572483 (9/1/2016)


    We have a new process to run updates on it that takes forever, so it needs a primary key or at least a unique index. One column already has a non-unique index if that's any help.

    Unless your process is going to filter on that new column, adding it is a complete waste.

    And updates don't need a primary key or unique index. An index on whatever the update filters on will be useful, doesn't matter if it's not unique.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Look at the missing index stats and the index usage stats to help determine a useful clustered index. Only if no useful clustering key exists, and remember to consider something like added datetime, then you can fall back to using identity.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • This was the quick & easy solution. I assumed that an identity column had to be specified before you start adding records. No, it will create them on existing records when you create the column! Thanks!

  • Yes, its always a best practice to have a primary key on table to avoid the heap. It can be either simple identity column or any business key.

    Also as our experts suggested, its always suggested to have unique index on filtering column.

    Thanks,

    Durga Prasad.

  • A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (9/2/2016)


    A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.

    I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (9/2/2016)


    ScottPletcher (9/2/2016)


    A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.

    I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.

    True. But, again, you don't need a primary key necessarily, just a clustered index. And a CI doesn't have to be unique on its own (SQL forces it to be unique "under the covers", of course, but that's irrelevant to the index create itself). Often a datetime is vastly more useful for lookup than an identity, and the fact that multiple key values can occur doesn't matter.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (9/2/2016)


    Sean Lange (9/2/2016)


    ScottPletcher (9/2/2016)


    A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.

    I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.

    True. But, again, you don't need a primary key necessarily, just a clustered index. And a CI doesn't have to be unique on its own (SQL forces it to be unique "under the covers", of course, but that's irrelevant to the index create itself). Often a datetime is vastly more useful for lookup than an identity, and the fact that multiple key values can occur doesn't matter.

    I of course understand that part. My point is that the OP is talking about some new process they have which does updates to this and they wanted unique values. At this point we are speculating on what may or may not be the process the OP actually needs.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • durga.palepu (9/1/2016)


    Yes, its always a best practice to have a primary key on table to avoid the heap. It can be either simple identity column or any business key.

    Also as our experts suggested, its always suggested to have unique index on filtering column.

    Thanks,

    Durga Prasad.

    Ah... you said "always". Are you a betting man? 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply