Need to find out what combination of columns makes each record in table unique

Question

Need to find out what combination of columns makes each record in table unique

VoldemarG

Hall of Fame

Points: 3883
More actions
November 26, 2019 at 7:55 pm

#3701208

I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table. and 550 columns. However I need to create a few rather complex queries against that table.
There is no PK, no Clustered or Unique index on the table. There are a bunch of non-unique indexes only.
I need to come up with a T-SQL (or may be there is one somewhere already?.. I searched and did not find ..)
that will show me what combination of columns marks a record in this table unique. I know for the fact there is no one single column that has unique value per row but there are definitely several columns that are unique per row together. I just don't know which ones. Has anybody done anything like that before?
Likes to play Chess

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247203 More actions · Answer 1

There could (and probably will be) be multiple combinations of columns which are unique, based on existing data. That does not tell you which of the combinations, if any, you should choose for a clustered index. You need to find a way of getting to grips with the data enough to understand roughly what is going on, and that will help you choose the right combination.

Steve Collins SSCrazy Points: 2243 More actions · Answer 2

If the number of candidate columns is small you could exhaustively test the possible combinations using a cursor and dynamic sql.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 3

VoldemarG wrote:

I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table. and 550 columns. However I need to create a few rather complex queries against that table.

I would think you have to have at least some idea of what data is stored in that table - else you cannot even begin to write any queries, let alone *rather complex queries* against that table.

With that in mind - I would use SQL Search by Redgate to search that database (and any other related databases) for reference to that table. This table sounds like it is built from a combination of other tables and I suspect that there is some code that is used to build that table. If that isn't the case - then I would hope there is other code that references that table, and examining that code should help in identifying how the table has been previously utilized.

If I still couldn't find out - then I would start asking associates and/or the application owner - see if they have an idea on what the data represents and what could possibly be utilized to uniquely identify the rows. Once you have an idea on what actually sources this data, then you can start looking at individual columns to determine which ones might possibly be candidate keys.

Finally - look at the non-clustered indexes...there might just be one (or more) that were not created as unique, but are actually meant to be the PK or unique.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

jonas.gunnarsson 52434 Ten Centuries Points: 1240 More actions · Answer 4

May be Identifying Relationship Between Columns In SQL can be of use?

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 5

Have you asked anyone???

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Brahmanand Shukla SSC Eights! Points: 990 More actions · Answer 6

VoldemarG wrote:

I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table. and 550 columns. However I need to create a few rather complex queries against that table.
There is no PK, no Clustered or Unique index on the table. There are a bunch of non-unique indexes only.
I need to come up with a T-SQL (or may be there is one somewhere already?.. I searched and did not find ..)
that will show me what combination of columns marks a record in this table unique. I know for the fact there is no one single column that has unique value per row but there are definitely several columns that are unique per row together. I just don't know which ones. Has anybody done anything like that before?

Try exploring statistics.

SQL Server Carpenter - Consultant & Trainer
Author of Learn T-SQL from Scratch

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 7

Best way I can think of is to do a little data exploration so you better understand the table (500+ columns... <sigh>), then, on a TEST system, try creating a unique constraint (don't do a clustered index yet, just nonclustered for testing, avoids data movement, makes dropping the index easy). Use the combination of columns that makes the most sense based on the structure. That's how I'd do it.

Of course, then, I'd get to work normalizing the data structure and/or fixing the star schema.

500 columns...

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 8

It's difficult for me to believe that such a table exists and no one knows anything about it. Find who does and ask them.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Grant Fritchey SSC Guru Points: 399363 More actions · Answer 9

Jeff Moden wrote:

It's difficult for me to believe that such a table exists and no one knows anything about it. Find who does and ask them.

Very good point, as usual.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 10

With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:

3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073

I think Jeff has the right suggestion of finding someone who might know.

Or maybe you could add an identity column and make it the primary key?

Phil Parkin SSC Guru Points: 247203 More actions · Answer 11

Jonathan AC Roberts wrote:

With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:
3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073
I think Jeff has the right suggestion of finding someone who might know.

That's a great calculator you've got there ... must be the size of a TV 🙂

Steve Collins SSCrazy Points: 2243 More actions · Answer 12

Jonathan AC Roberts wrote:

With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:
3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073

For sure having some insight, any insight, into the meaning of the columns is essential. Anyway, if the OP could boil it down to 10 candidate columns it's only 1,023 combinations (including single columns, excluding those it's 1,013). Maybe that's doable. If it's 12 columns that's 4,095 combinations which could take a while.

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

VoldemarG Hall of Fame Points: 3883 More actions · Answer 13

Thanks everyone for your input.

I really appreciate it.

Will take all of it into account. Happy Thanksgiving!

Likes to play Chess

VoldemarG Hall of Fame Points: 3883 More actions · Answer 14

VoldemarG

Hall of Fame

Points: 3883

November 27, 2019 at 3:46 pm

#3701593

Thanks! Very helpful suggestion.

Likes to play Chess