Add unique valules to a heap

Question

Add unique valules to a heap

dan-572483

SSCoach

Points: 15870
More actions
September 1, 2016 at 12:29 pm

#313095

I have a large table with duplicate values in all columns. I need to add a column, populate it with unique values (INT with values 1,2,3,etc is fine) so I can create a unique index.
What's the best way to go about this? This is SQL 2008.

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183698 More actions · Answer 1

Why do you want to create a column with unique values to create a unique index on it? What do you expect to achieve?

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

dan-572483 SSCoach Points: 15870 More actions · Answer 2

We have a new process to run updates on it that takes forever, so it needs a primary key or at least a unique index. One column already has a non-unique index if that's any help.

Sean Lange SSC Guru Points: 286573 More actions · Answer 3

There is still something missing here but why not just add an identity column? Then make it the primary key since it doesn't seem to matter what the clustered index might be.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Luis Cazares SSC Guru Points: 183698 More actions · Answer 4

This smells like RBAR. That would certainly be slow.

If you post DDL for your table and indexes, along with the UPDATE process we could give better advice on what to change to improve performance.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 5

dan-572483 (9/1/2016)
We have a new process to run updates on it that takes forever, so it needs a primary key or at least a unique index. One column already has a non-unique index if that's any help.

Unless your process is going to filter on that new column, adding it is a complete waste.

And updates don't need a primary key or unique index. An index on whatever the update filters on will be useful, doesn't matter if it's not unique.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ScottPletcher SSC Guru Points: 100949 More actions · Answer 6

Look at the missing index stats and the index usage stats to help determine a useful clustered index. Only if no useful clustering key exists, and remember to consider something like added datetime, then you can fall back to using identity.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

dan-572483 SSCoach Points: 15870 More actions · Answer 7

This was the quick & easy solution. I assumed that an identity column had to be specified before you start adding records. No, it will create them on existing records when you create the column! Thanks!

durga.palepu Hall of Fame Points: 3530 More actions · Answer 8

Yes, its always a best practice to have a primary key on table to avoid the heap. It can be either simple identity column or any business key.

Also as our experts suggested, its always suggested to have unique index on filtering column.

Thanks,

Durga Prasad.

ScottPletcher SSC Guru Points: 100949 More actions · Answer 9

A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Sean Lange SSC Guru Points: 286573 More actions · Answer 10

ScottPletcher (9/2/2016)
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.

I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

ScottPletcher SSC Guru Points: 100949 More actions · Answer 11

Sean Lange (9/2/2016)
ScottPletcher (9/2/2016)
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.
I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.

True. But, again, you don't need a primary key necessarily, just a clustered index. And a CI doesn't have to be unique on its own (SQL forces it to be unique "under the covers", of course, but that's irrelevant to the index create itself). Often a datetime is vastly more useful for lookup than an identity, and the fact that multiple key values can occur doesn't matter.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Sean Lange SSC Guru Points: 286573 More actions · Answer 12

ScottPletcher (9/2/2016)
Sean Lange (9/2/2016)
ScottPletcher (9/2/2016)
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.
I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.
True. But, again, you don't need a primary key necessarily, just a clustered index. And a CI doesn't have to be unique on its own (SQL forces it to be unique "under the covers", of course, but that's irrelevant to the index create itself). Often a datetime is vastly more useful for lookup than an identity, and the fact that multiple key values can occur doesn't matter.

I of course understand that part. My point is that the OP is talking about some new process they have which does updates to this and they wanted unique values. At this point we are speculating on what may or may not be the process the OP actually needs.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 13

durga.palepu (9/1/2016)
Yes, its always a best practice to have a primary key on table to avoid the heap. It can be either simple identity column or any business key.
Also as our experts suggested, its always suggested to have unique index on filtering column.
Thanks,
Durga Prasad.

Ah... you said "always". Are you a betting man? 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)