Worst Practices - Not Using Primary Keys and Clustered Indexes

  • Hi all,

    So often we confuse logical requirements with those of physical design.

    I firmly believe that in logical design, every table must have a primary key. But that does not mean we are required to declare one in the physical structure.

    While the logical design must have a PK, there are any number of ways to assert it physically. Certainly the unique index resulting from a primary key declaration is the one we most commonly use, but the same thing can be accomplished with a trigger, DML discipline, and perhaps other ways.

    Case in point, consider a table limited by design to one row. Need that table have a unique index on the PK? Or consider a temp table that is a subset of another table with a PK--need we have a unique index on that? Or consider in-memory arrays: how many times do we see physical PK's implemented on tables of that sort? Seldom, I say. Instead, we rely on the devices of good programming practice to make sure the rows are uniquely addressable.

    Now think about the opposite situation: a database that has an intrisic rowid that is capable of uniquely identifying every row quite aside from any other columns. That's roughly the equivilent to the knee-jerk approach of adding an identity column to every table. (Actually, SQL Server does not assert uniqueness on identity columns, but I digress). The point is, we still need a logical key that conveys business meaning and business uniqueness on such a table. And we need that even the physical design already provides a unique key of sorts.

    So, is not having a primary key a worst practice? In logical design, absolutely. In physical design, more often than not, a primary key is a good idea--but the worst practice would be to always create one regardless of the impacts or without an understanding of why it is needed.

    I don't regard a clustered index on every table to be a best practice, much less absent mindedly making it the primary key. I've seen some compelling arguments for clustered indexes on identity columns to minimize fragmentation, reduce page splits, and otherwise benefit performance. I contend those are strictly physical considerations and have marginal bearing on what the logical primary key should be and whether that column should comprise it.

  • INMHO the confusion comes from the clustered index term in SQL Server.

    In other DBMS's for example DB2 there is no concept of a primary clustered index because the primary key definition is synonymous with the primary clustering of the data pages (at least after each Re-Org).

    As to why you would choose to define a primary key to be anything but the natural data vector with highest select frequency and highest row selectivity is beyond me.

    However if there is no Majority search condition which is far above in frequency of execution and selectivity of data than the remainder of selects; for example Where many equally orthogonal search vectors are equally weighted, then there may be no benefit to add a identity as clustering index to the data. The data rows are likely to be an different pages (for any set of search or ordering vectors) and any clustering would add a small overhead which would show no efficiency improvement due to indexing on select. Many non-clustered (the term is misleading because the index itself is still internally clustered) indexes on each of these vectors could help if the data was not specifically volatile.

    The only argument for an identity as primary clustering index is where the rows insert rarely and frequently update to a situation causing the update row to force onto a different page. as only clustering index would need to change any not any of the non-clustered that reference it and not the data row.

    I don't have a feel for whether this is a good match for the OLTP access pattern.

    You almost need the uniqueness of a QUID across instances combined with the sequentiality of the identity combined with the ordering of the most selected natural key in order to optimise indexing for retrieval.

  • Hi,

    I am somehow amazed by the discussion. For me, as a database designer, using PK is a must for each and every table, no questions asked.

    You can't design a fully controlled database without having PK for all tables and without using foreign keys that define your database integrity. Foreign keys were not mentioned in the article but I am sure that Andy will speak about foreign keys in one of his next articles. As we all know, to use a FK you must have PK predefined.

    I use Visio to show and print my database structure. With no primary keys and no foreign keys one cannot express the real database design.

    David

  • Regarding clustered indices -

    I think 99.99999999999% of the time you're better off having one...

    It can help when you defragment that table.

    I think one example where clustered indices might not be worthwhile would be audit tables, that will most likely never be read but are a business requirement, but that's an area thats open for debate.:)

    Mark

  • Great article Yoda: "you'll never wrong go by creating a primary key on a table, temp or not"

    Seriously though, I find the resistance to your thoughts that, with a VERY few exceptions, every table should have a clustered index, even if it's only on an identity column. It shows a clear lack of knowledge in our community for how data is stored and retrieved in SQL Server.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Andy,

    I know you wrote this article light years ago, and my question is a little off your topic, but I'd like to ask anyway. I understand the concept of a primary key. But when I manage relationships to a table using EM, I also have the option of creating a UNIQUE CONSTRAINT, or a UNIQUE INDEX on a column. I usually select index, because that seems more intuitive to what I'm trying to accomplish (more direct access to the data in the column), but I've never been able to figure out why I would choose a unique constraint over a unique index. Can you explain why I might want to?

    Thanks,

    Mattie

  • Check out SqlTac (www.SqlTac.com), it detects and fixes many of the most common database design issues.

  • From my experience your database design is dependant on your arogance. One database i was involved in designing uses some PK's however primarily it is controlled by a few logical pk's. this works for what we are doing with it. (it is also internal to my group and only accessed by a few people.)

    As for the other main db i support which ranges from 200gb to 2tb depenant on location. This database has no pk's, no clustered indexes and stores hundreds of thousands of new entries each day. It performs quite well returning results in less than 5ms in most cases. There are plenty of nonclustered indexes depending on the tables and hte way each db is utilized by the customer. Because of the far reaching diversity of this app i think it is beneficial to not utilize pk's in a physical sense.

    It truly depends on your purpose and how tight the controls on your app as well. If you are designing a db to be unleashed to developers and there are not good controls in place then yes... protect the db as hard as you can. This will help garbage in garbage out from being an issue.

    so basically my rambling states my opinion... Unless you can 100% say there is never an exception then you cannot use the word always. There are ALWAYS times that the rules do not completely apply.

  • Mattie,

    You create a unique index when you set a PK. Setting the PK is the equivalent, but it does help you pass along information to others and even yourself about what is the column that defines uniqueness for that entity.

    SQL Server is more efficient with clustered indexes, and it's worth setting one. It doesn't hurt you and while you might get along without it, how are you sure it wouldn't run better with one?

    PKs are logical entities anyway. They build a unique index (clustered or non-clustered), which protects the data, but having it there allows that information about uniqueness to transfer to new people. It's a good idea. Required? No, but I think it's a bad practice to avoid it. There's no reason not to do it.

  • Steve,

    Thanks for responding, I probably should have opened a new topic. I'm not arguing about the need for a primary key, and I realize that it has to be unique. (I've read enough posts titled 'how do I remove duplicates?' to convince me of that:)). So the only time I don't have one is in temp tables, and that's usually because I can't figure out what makes an entry unique.

    I was wondering about the operational difference between a unique constraint, and a unique index, and why I might choose to have one rather than the other.

    Mattie

  • Does anybody know what the difference in performance is when doing DELETE or INSERT on a table when having / not having a primary key and clustered index? Especially as far as large tables are concerned.

    Thanks,

    JT

  • Something missed as a valid use of tables without a primary key / clustered index (at least IMHO) are highly volatile staging tables. In our permanent relational stores we use surrogate Primary Keys (Not auto increments) but these are always created as a non clustered index, our natural keys are always indexed as Clustered unique. However, in the transactional portion of the application we shred an XML document into a permanent staging table using the SPID and a TranID GUID of the transaction as its primary identifier. The usefulness of this data is quite short lived (Existing usually for less than a second while the transaction is being processed). On this table the additional overhead of index maintenance alone doubled the time the transaction took in the production environment. It was faster to vet the data with validation code modeling the business rules as opposed to building said rules into the schema as is our normal practice. Granted, the developer does assume the responsibility in this case of maintaining this validation code, but that is what we are paid for.

    I guess my point is that there are no categorical best / worst practices, everything has an exception and being able to spot those exceptions and make the right decision is where a developer is truly worth their salt, anyone can follow a set of rules blindly, but that will always lead to problems in large enough systems...

    Regads,

    Bill

    P.S. I didn't read through all 17 pages of replies so my points may have already been raised. If so I would be interested to pointers to those posts to see other peoples takes on it. Thanks...

  • I've just come across a situation where an update/insert on a linked server's table failed, as it didn't have a PK - so there's another reason to make sure you always have one...

  • Does anybody know what the difference in performance is when doing DELETE or INSERT on a table when having / not having a primary key and clustered index? Especially as far as large tables are concerned.

    Thanks,

    JT

  • jthorpe (5/9/2008)


    Does anybody know what the difference in performance is when doing DELETE or INSERT on a table when having / not having a primary key and clustered index? Especially as far as large tables are concerned.

    Thanks,

    JT

    In a heap (table with no clustered index) - space isn't reused. So when you delete rows, and then insert, the delete frees up space inline, but the inserts happen to the end of the table. This will lead to increasing fragmented (and larger) tables. What makes it even worse is that short of copying the data into a new table, then dropping the original table, there's no way to defragment a heap.

    In the presence of a clustered index - the data is stored inline of the clustered key (i.e. the clustered key determines the physical order for storing data). while this might cause what is called "page splits" if you need to insert data in a spot with insufficient free space to accomodate it, rebuilding the clustered index would allow the data to get reorganized and would free the space back up for reuse without having to drop and recreate the entire table.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 151 through 165 (of 184 total)

You must be logged in to reply to this topic. Login to reply