Need to find out what combination of columns makes each record in table unique

  • VoldemarG

    Hall of Fame

    Points: 3686

    I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table.  and 550 columns. However I need to create a few rather complex queries against that table.

    There is no PK, no Clustered or Unique index on the table.  There are a bunch of non-unique indexes only.

    I need to come up with a T-SQL (or may be there is one somewhere already?.. I searched and did not find ..)

    that will show me what combination of columns marks a record in this table unique.  I know for the fact there is no one single column that has unique value per row but there are definitely several columns that are unique per row together. I just don't know which ones. Has anybody done anything like that before?

    Voldemar likes to play CHESS (and IS good at it!)

  • Phil Parkin

    SSC Guru

    Points: 244801

    There could (and probably will be) be multiple combinations of columns which are unique, based on existing data. That does not tell you which of the combinations, if any, you should choose for a clustered index. You need to find a way of getting to grips with the data enough to understand roughly what is going on, and that will help you choose the right combination.

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • Steve Collins

    Ten Centuries

    Points: 1123

    If the number of candidate columns is small you could exhaustively test the possible combinations using a cursor and dynamic sql.

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Jeffrey Williams

    SSC Guru

    Points: 88712

    VoldemarG wrote:

    I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table.  and 550 columns. However I need to create a few rather complex queries against that table.

    I would think you have to have at least some idea of what data is stored in that table - else you cannot even begin to write any queries, let alone *rather complex queries* against that table.

    With that in mind - I would use SQL Search by Redgate to search that database (and any other related databases) for reference to that table.  This table sounds like it is built from a combination of other tables and I suspect that there is some code that is used to build that table.  If that isn't the case - then I would hope there is other code that references that table, and examining that code should help in identifying how the table has been previously utilized.

    If I still couldn't find out - then I would start asking associates and/or the application owner - see if they have an idea on what the data represents and what could possibly be utilized to uniquely identify the rows.  Once you have an idea on what actually sources this data, then you can start looking at individual columns to determine which ones might possibly be candidate keys.

    Finally - look at the non-clustered indexes...there might just be one (or more) that were not created as unique, but are actually meant to be the PK or unique.

    Jeffrey Williams
    Problems are opportunities brilliantly disguised as insurmountable obstacles.

    How to post questions to get better answers faster
    Managing Transaction Logs

  • jonas.gunnarsson 52434

    Ten Centuries

    Points: 1227

    May be Identifying Relationship Between Columns In SQL can be of use?

  • Jeff Moden

    SSC Guru

    Points: 997345

    Have you asked anyone???

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Brahmanand Shukla

    SSC Eights!

    Points: 938

    VoldemarG wrote:

    I am working with a table that I have not seen before, and do not have a clear idea on what it is doing or what the data in it really means. There are 5+ million rows in this table.  and 550 columns. However I need to create a few rather complex queries against that table.

    There is no PK, no Clustered or Unique index on the table.  There are a bunch of non-unique indexes only.

    I need to come up with a T-SQL (or may be there is one somewhere already?.. I searched and did not find ..)

    that will show me what combination of columns marks a record in this table unique.  I know for the fact there is no one single column that has unique value per row but there are definitely several columns that are unique per row together. I just don't know which ones. Has anybody done anything like that before?

    Try exploring statistics.

  • Grant Fritchey

    SSC Guru

    Points: 396769

    Best way I can think of is to do a little data exploration so you better understand the table (500+ columns... <sigh>), then, on a TEST system, try creating a unique constraint (don't do a clustered index yet, just nonclustered for testing, avoids data movement, makes dropping the index easy). Use the combination of columns that makes the most sense based on the structure. That's how I'd do it.

    Of course, then, I'd get to work normalizing the data structure and/or fixing the star schema.

    500 columns...

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Jeff Moden

    SSC Guru

    Points: 997345

    It's difficult for me to believe that such a table exists and no one knows anything about it.  Find who does and ask them.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "Dear Lord... I'm a DBA so please give me patience because, if you give me strength, I'm going to need bail money too!"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Grant Fritchey

    SSC Guru

    Points: 396769

    Jeff Moden wrote:

    It's difficult for me to believe that such a table exists and no one knows anything about it.  Find who does and ask them.

    Very good point, as usual.

    ----------------------------------------------------
    The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood...
    Theodore Roosevelt

    The Scary DBA
    Author of: SQL Server 2017 Query Performance Tuning, 5th Edition and SQL Server Execution Plans, 3rd Edition
    Product Evangelist for Red Gate Software

  • Jonathan AC Roberts

    SSCoach

    Points: 17342

    With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:

    3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073

    I think Jeff has the right suggestion of finding someone who might know.

    Or maybe you could add an identity column and make it the primary key?

  • Phil Parkin

    SSC Guru

    Points: 244801

    Jonathan AC Roberts wrote:

    With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:

    3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073

    I think Jeff has the right suggestion of finding someone who might know.

    That's a great calculator you've got there ... must be the size of a TV 🙂

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • Steve Collins

    Ten Centuries

    Points: 1123

    Jonathan AC Roberts wrote:

    With 550 columns it's not possible to test every combination of 2 or more columns to see if they are unique. It's a combinatorial problem with a search space of 3.6855 × 10^(165). Or to be more precise:

    3685510180489786476798393145496356338786055879312930105836138965083617346086082863365358130056307390177215209990980317284932211552660930305235775636164742230126362073

    For sure having some insight, any insight, into the meaning of the columns is essential.  Anyway, if the OP could boil it down to 10 candidate columns it's only 1,023 combinations (including single columns, excluding those it's 1,013).  Maybe that's doable.  If it's 12 columns that's 4,095 combinations which could take a while.

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • VoldemarG

    Hall of Fame

    Points: 3686

    Thanks everyone for your input.

    I really appreciate it.

    Will take all of it into account.  Happy Thanksgiving!

    Voldemar likes to play CHESS (and IS good at it!)

  • VoldemarG

    Hall of Fame

    Points: 3686

    Thanks! Very helpful suggestion.

    Voldemar likes to play CHESS (and IS good at it!)

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply