How to find candidate keys

  • In order to get a list of columns that could be primary keys, is it reasonable to do select * from a table, count its rows, and compare that result to select column_name from that table to see if it returns the same number of rows?

    Are select distinct column_name from table or select count(*), column_name from table group by column_name better tests to qualify the column as a candidate or alternate?

    Assuming the aggregate evaporates nulls, does a single null disqualify the column from candidate or alternate?

    how or what to query or steal to find these?

    thanks very much

    drew

  • My first concern is why you cannot determine a primary key based on your knowledge of the table and its use. It should have some sort of meaning to the business.

    Jared
    CE - Microsoft

  • Really the best way is to analyze what the data represents. Just because there are no duplicates today does not mean there won't be tomorrow. And yes if there is a null that pretty much says it can't be the key. A primary key cannot be null. The notion that the row does not contain a value for a given property indicates it is not valid as row identifier.

    There is no rule about how to identify candidate keys.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • May I ask an academic question of the OP (that I am pretty sure I know the answer to)?

    Please define a candidate key and a primary key.

  • a candidate key is an attribute or attributes that do not have more than one row in a relation. a primary key is the chosen candidate key that dovetails most closely with the language of the business, that identifies one thing from another, that is, the attribute or attibutes defining uniqueness, because if you cannot tell the instances of an entity from one another you have defeated the purpose of a database.

    thanks very much

    drew

  • Agreed,

    i may be too mechanistic in the approach.

    thanks very much.

    drew

  • drew.georgopulos (3/28/2012)


    a candidate key is an attribute or attributes that do not have more than one row in a relation. a primary key is the chosen candidate key that dovetails most closely with the language of the business, that identifies one thing from another, that is, the attribute or attibutes defining uniqueness, because if you cannot tell the instances of an entity from one another you have defeated the purpose of a database.

    thanks very much

    drew

    Awesome. You'd be surprised how many people don't know this. I have been asked this question in a couple of interviews and was surprised myself to find out not many got it right.

    Now, from that you should be able to determine what you are looking for by analyzing the data and the application. It may be possible that what may look like a candidate key based on the data may only be so due to the data currently in the system.

  • Ok, let's say you are trying to identify this. You do a SELECT DISTINCT ColumnA, ColumnB and compare it with the number of rows in the table. They match, so at the surface you may say that this is a candidate key. What happens tomorrow when a row is inserted that is a duplicate in ColumnA and Column B that was not there the day before?

    You could say that you would not have made that your primary key because you knew from business rules that it "could" be duplicated. Then I would ask why you even tried to identify the key if you already knew that? It seems you are adding a step that could be skipped, since you cannot possibly choose a primary key without an understanding of the data to begin with, otherwise the application could break. The worst thing that could happen is that it does not break for months, and by t hen you have forgotten what you did that may have caused this issue.

    Jared
    CE - Microsoft

  • thanks very much...i got here because an associate happened to notice a duplicate row in a pharmacy feed. i asked them how could it be that a duplicate could get loaded if the rule is that One Member gets One RX for One NDC on One day from One Provider at One Status?

    they used the state file name to check for uniqueness, not the business rule, with the result that we had two loads that were duplcate except the pharmacy benefit manager happened to send us the same file on two different days....so that satisfied the ETL engineer's idea of uniquness, but it took a nurse doing medical record audit to find it.

    So i thought that was ugly.

    I wanted to get in front of it by prospecting for other clams out there that were also posing as unique, but that were harboring doubles or triples of unit counts or amounts paid with no one the wiser.

    Quixotic no?

  • I would do a query with:

    SELECT

    COUNT(*) AS [Total],

    COUNT(DISTINCT column_1) AS [Column_1],

    COUNT(DISTINCT column_2) AS [Column_2],

    ...

    FROM ...

    For any/all columns that could reasonably be the key. That way you can "research" all columns in one pass.

    Sometimes you just have to take your best shot temporarily if you don't know the data and refine the key later as you learn more about the table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (3/28/2012)


    I would do a query with:

    SELECT

    COUNT(*) AS [Total],

    COUNT(DISTINCT column_1) AS [Column_1],

    COUNT(DISTINCT column_2) AS [Column_2],

    ...

    FROM ...

    For any/all columns that could reasonably be the key. That way you can "research" all columns in one pass.

    Sometimes you just have to take your best shot temporarily if you don't know the data and refine the key later as you learn more about the table.

    That would be sort of marginally acceptable for a single column key. It doesn't do anything for composite keys though which more often than not you would need a composite key. If a single column would work it is probably an identity and likely the key already. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I use identity as a clus key only as a last resort. Usually there's a better business key value that yields much better overall performance.

    On a table I knew nothing about, otoh, of course I would be more inclined to use identity, esp. it the table was currently just a heap.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Agreed. 😀 A key that models after the business is certainly preferred. All I was getting at is that a business modeled key tends to be a composite key a lot of the times so looking at single column counts doesn't really provide any kind of insight there.

    It sounds like the OP knows the business and the data pretty well and is probably looking for some validation and support before suggesting that they change the keys as they exist currently because they are not very good keys.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • indeed, the question at hand is for the compund case; Member, RX, NDC, Day, Provider and Status offer lots of opportunity to fall off the wagon <g>

  • Sounds to me from your description that everything except Day is the same as the business definition? I would say you have a solid argument about dropping Day from the key.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply