Table Without PK

  • What is the impact when a table doesn't have PK in db

  • It depends.

    What is the purpose of the table?

    The absence of a PK can have an impact on referential integrity and choosing what column(s) are used for relationships with other tables.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • A relation must have a primary key. If your table does not have a primary key, then it's the physical implementation of something that doesn't exist in the relational model.

    Maybe an Excel spreadsheet, a CSV file, but defintely not a relation.

    No primary key can be good for temporary tables or staging tables.

    -- Gianluca Sartori

  • If nothing constrains the data in any way, you can have duplicate rows.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Another difference is when you create nonclustered indexes on the heap, each row can't inherit the bytes of the primary key. Instead, it inherits the bytes of the RID for each row, making them larger than they could otherwise be.

    Like Gianluca said, heaps can be okay for staging tables, but I wouldn't use them for production tables.

  • Ed Wagner (5/7/2015)


    Another difference is when you create nonclustered indexes on the heap, each row can't inherit the bytes of the primary key. Instead, it inherits the bytes of the RID for each row, making them larger than they could otherwise be.

    Like Gianluca said, heaps can be okay for staging tables, but I wouldn't use them for production tables.

    Careful now... by definition, NCI's don't ever "inherit the bytes" of the primary key. Rather they "inherit the bytes" of the CI. The confusion comes about because most people assign the PK columns as the clustered index by default.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (5/7/2015)


    Ed Wagner (5/7/2015)


    Another difference is when you create nonclustered indexes on the heap, each row can't inherit the bytes of the primary key. Instead, it inherits the bytes of the RID for each row, making them larger than they could otherwise be.

    Like Gianluca said, heaps can be okay for staging tables, but I wouldn't use them for production tables.

    Careful now... by definition, NCI's don't ever "inherit the bytes" of the primary key. Rather they "inherit the bytes" of the CI. The confusion comes about because most people assign the PK columns as the clustered index by default.

    You're absolutely right, Jeff, thanks for the correction. I find myself saying the same thing in conversation and correcting it, but I didn't do that here. Thanks again.

  • Designing your tables with PKs is a best practice.

    You should not have PKs only for staging tables.

    This article is very good - http://www.mssqltips.com/sqlservertip/1254/clustered-tables-vs-heap-tables/

    Igor Micev,My blog: www.igormicev.com

  • Primary key != clustered index.

    A table can be a heap and have a primary key

    A table may have a clustered index but no primary key.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/8/2015)


    Primary key != clustered index.

    A table can be a heap and have a primary key

    A table may have a clustered index but no primary key.

    Absolutely agree. My post is for the case when the PK is also a clustered key.

    Thanks.

    Igor Micev,My blog: www.igormicev.com

  • spaghettidba (5/7/2015)


    A relation must have a primary key. If your table does not have a primary key, then it's the physical implementation of something that doesn't exist in the relational model.

    Maybe an Excel spreadsheet, a CSV file, but defintely not a relation.

    No primary key can be good for temporary tables or staging tables.

    This was one of the walls I ran into when I came into the relational world and database world.

    What if you are allowed to have duplicates with transactional records that have customer ID's which forms the relationship? What purpose is the PK giving you outside of deduplicating the records that are absolutely within the business requirements to keep?

    Does not having a unique key hurt the table then?

    In this case, the customer ID and Timestamp are the most important fields used to relate the data across all tables that also have duplicates. Clustered happens on time rather than the bond that joins us together. Updates on the table are approached by dropping chunks of time and replacing with time because you cannot confirm how many duplicates are allowed and not allowed. Therefore, refresh is much like a republish of data versus an update.

  • xsevensinzx (5/8/2015)


    What if you are allowed to have duplicates with transactional records that have customer ID's which forms the relationship? What purpose is the PK giving you outside of deduplicating the records that are absolutely within the business requirements to keep?

    If you are allowed multiple transactions per customer, then customer is not the primary key. What uniquely identifies a transaction? Once you answer that question (and from you last paragraph it looks like maybe Customer ID and the Timestamp), then that becomes the primary key (or a unique constraint if you're using artificial primary keys)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • xsevensinzx (5/8/2015)


    spaghettidba (5/7/2015)


    A relation must have a primary key. If your table does not have a primary key, then it's the physical implementation of something that doesn't exist in the relational model.

    Maybe an Excel spreadsheet, a CSV file, but defintely not a relation.

    No primary key can be good for temporary tables or staging tables.

    This was one of the walls I ran into when I came into the relational world and database world.

    What if you are allowed to have duplicates with transactional records that have customer ID's which forms the relationship? What purpose is the PK giving you outside of deduplicating the records that are absolutely within the business requirements to keep?

    Does not having a unique key hurt the table then?

    In this case, the customer ID and Timestamp are the most important fields used to relate the data across all tables that also have duplicates. Clustered happens on time rather than the bond that joins us together. Updates on the table are approached by dropping chunks of time and replacing with time because you cannot confirm how many duplicates are allowed and not allowed. Therefore, refresh is much like a republish of data versus an update.

    Nothing wrong with a compound primary key. Nothing wrong with a primary key that's not an identity column either. People get overly focused on that. The concept of a primary key is having a way to uniquely identify any given row. It's part of the normalization process. Normalization doesn't require that an identity be present. If the proper way to identify the row is by datetime and then customerid, excellent. Use that. (from the sounds of it, that's likely to be right order for the key in such a situation since you're moving data in & out based on time & date).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • GilaMonster (5/8/2015)


    xsevensinzx (5/8/2015)


    What if you are allowed to have duplicates with transactional records that have customer ID's which forms the relationship? What purpose is the PK giving you outside of deduplicating the records that are absolutely within the business requirements to keep?

    If you are allowed multiple transactions per customer, then customer is not the primary key. What uniquely identifies a transaction? Once you answer that question (and from you last paragraph it looks like maybe Customer ID and the Timestamp), then that becomes the primary key (or a unique constraint if you're using artificial primary keys)

    Yep, in the answer I came too was count(), which is the approach I took to identifying that primary key in this instance.

    But, you still have to answer the question of why the transaction has to be uniquely identified in a world where not being unique is allowed? What use is that primary key that is not used in the relational model anywhere? It's uniquely identifying a record to the table where we do not care it's uniquely identified. 😉

  • xsevensinzx (5/8/2015)


    GilaMonster (5/8/2015)


    xsevensinzx (5/8/2015)


    What if you are allowed to have duplicates with transactional records that have customer ID's which forms the relationship? What purpose is the PK giving you outside of deduplicating the records that are absolutely within the business requirements to keep?

    If you are allowed multiple transactions per customer, then customer is not the primary key. What uniquely identifies a transaction? Once you answer that question (and from you last paragraph it looks like maybe Customer ID and the Timestamp), then that becomes the primary key (or a unique constraint if you're using artificial primary keys)

    Yep, in the answer I came too was count(), which is the approach I took to identifying that primary key in this instance.

    But, you still have to answer the question of why the transaction has to be uniquely identified in a world where not being unique is allowed? What use is that primary key that is not used in the relational model anywhere? It's uniquely identifying a record to the table where we do not care it's uniquely identified. 😉

    It's called surrogate key. If your table only has a surrogate key and no natural key, the likelyhood of a design/normalization mistake skyrockets.

    -- Gianluca Sartori

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply