Worst Practices - Not Using Primary Keys and Clustered Indexes

  • I agree with this whole heartedly, the problem I get from developers is more over indexing a column to death, then the performance can really degrade!

     

    Excellent read though

  • Great article.

    I'm a consultant for services company, and we've recently been involved in a performance project where the application developer/provider removed all clustered indexes because the perception was they complicated locking. What they didn't consider was the indexes hadn't been rebuilt since the installation of the database.

    And as a result when I put the clustered indexes back on, and they saw report times go from 4 hours to 5 minutes in the extreme case, were shocked at the impact not having a clustered index on a table can have.

    Wish they had seen this article before "playing" with the system.

  • My understanding is that SQL stores the actual key of the clustered index as the lookup for the record in all non clustered indexes when a clustered index is present (not the page number).  This allows activity on the clustered index such as rebuilding without having to touch the non clustered indexes.  If you change the key value in a clustered index, all non clustered indexes must be updated.

    If this is not the case, please correct me.

  • Its true

    Having a index does help.

    I have a personal experience, recently when I took up a project was one of the job was taking 4-5hrs to complete, while it began with 1hr and slowly started degrading.

    So after go through the query and working the execution plan

    I found out that prog. uses temp table in which data is created and then sorted , but they never had any index on the temp table.

    So once I picked on the right columns, I created the index after the temp is filled with data, and also created clustered index on 1 another table. Viola the performance improved from 5hrs to within an hour so having index does help.

  • I might argue that, if you're having update problems, you chose the wrong clustered index. Not that you shouldn't have had one at all.

    --
    Adam Machanic
    whoisactive

  • The most misused index is the clustered index. When creating a primary key on a table, SQL Server automatically (and erroneously) clusters it, but because no two records will contain an identical primary key, clustering a primary key column is pointless. you may want to look at sql 2005 bol, they have there general description of the structure. clustered index forces physical order of the data, if you have primary key where it's an adwantage go ahead, but not everywhere.

    If the table has a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2005 makes any duplicate keys unique by adding an internally generated value called a uniqueifier. This four-byte value is not visible to users. It is only added when required to make the clustered key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.

  • primary keys on staging tables with hight data load activity and short processing wind are questionable too. clustered index takes time to build/rebuild

  • Andy,

     

    Good article. A comment that I strongly disagree with is:

     

    'but a damn good starting point is to just add a identity column called rowid and make it the primary key and the clustered key. '

     

    I disagree with this comment for two reasons.  The first, and I see this often; you SHOULD NOT create an ID column in your table for the sake of creating an ID column. An ID column is often justified for things like lookup tables, no problem there.  Secondly a Clustered Index almost always does not belong on the ID column, rather a natural key within the table. It is only then that you will see a benefit to the Clustered Index.

    <hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh

  • I also agree with you that each table should have a primary key and a clustered index. 

    The clustered index is the easy part.  

    The term “primary key” is the problem.  Since no distinction is made between a logical primary key and a physical primary key.  Not distinguishing between the two is itself a worse practice which is more widespread than the ones you discussed in the article.  The advantages of a natural key verses a surrogate key cannot be discussed when all we have is something called a primary key.  It is not clear if they are one and the same or if one of the other indexes indicates the natural key or if this key is not defined in the database and the data integrity associated with it is being handled in the application. 

    This misunderstanding leads to the worst practice of just adding an identity column called rowid and making it the primary key and not normalizing the table at all.  

    In line with this misunderstanding is the statement “Every time you add an index, it denormalizes your data.’  This statement is not valid since the logical key is still the same no matter how many indexes you add.

  • There should be caveats for these types of assertions.  In a data warehouse you would experience significant performance penalties always using declarative referential integrity.  In DW world it is the job of the ETL process to ensure the data being loaded is clean.

  • I strongly agree with Andy to always create primary keys. As identity seed columns, this helps prevent page splits. Too many splits can and will impede performance.

    Another reason is without a primary key the query optimizer will always do a table scan as it looks at the table as a heap.

    Thanks for this great article Andy!

  • I wish there were more responses from data warehouse applications. When I first started in DW environemt it was a mantra (supported/promoted by our architect and MS consultant) no primary or foreign keys and be caustious using clustered indexes. Goal was to getting data in, scrubbing and isolating bad data.

    You certainly don't hear MS promoting "no PK/FK" position and in fact most tools work best (data source views in AS and RS report models) when primary keys and foreign keys are defined. In last year I have come around and now re-thinking our old practices.

    I have no problem using identity data types for PK and then having indexes for natural key - also feel comfortable preserving these identity data type primary key values and keeping identity in synch with natural key over time (that is why we have SET IDENTIY INSERT and DBCC CHECKIDENT).

    Some BP's I promote for datamarts:

    If full table refresh, truncate, drop all constraints, load table, add constraints and always use fillfactor =100.

    If large data volume and incremental loads, use partitioned tables, if possible design indexes to be created on partition and use fillfactor=100 (this way you take advantage of partitions and drop constraints before inserts).

    Avoid UPDATE.

    Don't forget about INCLUDE clause on indexes.

    All tables have an identity data columns.

    Persist calcualted columns.

  • Personally I don’t like the use of surrogate keys as an advice to junior database developers.

    Once you tell them something like that many of them tend to use this without first thinking about searching for a natural key.

    For the rest I greatly appreciated the article.

    Greets,

    Peter

  • Really enjoyed the article. I thought it gave some good ideas, particularly for application developers, who often understand the basics of database design, but do not usually have enough experience of larger systems.

    I think you should do an article on Identity columns. Always an interesting debate, but I'd be interested to read your thoughts.

  • Hi all,

    So often we confuse logical requirements with those of physical design.

    I firmly believe that in logical design, every table must have a primary key. But that does not mean we are required to declare one in the physical structure.

    While the logical design must have a PK, there are any number of ways to assert it physically. Certainly the unique index resulting from a primary key declaration is the one we most commonly use, but the same thing can be accomplished with a trigger, DML discipline, and perhaps other ways.

    Case in point, consider a table limited by design to one row. Need that table have a unique index on the PK? Or consider a temp table that is a subset of another table with a PK--need we have a unique index on that? Or consider in-memory arrays: how many times do we see physical PK's implemented on tables of that sort? Seldom, I say. Instead, we rely on the devices of good programming practice to make sure the rows are uniquely addressable.

    Now think about the opposite situation: a database that has an intrisic rowid that is capable of uniquely identifying every row quite aside from any other columns. That's roughly the equivilent to the knee-jerk approach of adding an identity column to every table. (Actually, SQL Server does not assert uniqueness on identity columns, but I digress). The point is, we still need a logical key that conveys business meaning and business uniqueness on such a table. And we need that even the physical design already provides a unique key of sorts.

    So, do we need a primary key on every table? Yes, the logical design must have a primary key. No, the physical design needn't necessarily have one (in the sense which was meant here: a PK constraint from which a unique index is imbued).

    I don't regard a clustered index on every table to be a best practice, much less absent mindedly making it the primary key. I've seen some compelling arguments for clustered indexes on identity columns to minimize fragmentation, reduce page splits, and otherwise benefit performance. I contend those are strictly physical considerations and have marginal bearing on what the logical primary key should be and whether that column should comprise it.

Viewing 15 posts - 136 through 150 (of 184 total)

You must be logged in to reply to this topic. Login to reply