Worst Practices - Not Using Primary Keys and Clustered Indexes

  • quote:


    Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/awarren/worstpracticesnotusingprimarykeysandclusteredindex.asp


    It's hard to believe anyone can disagree with your sentiments. Even if you take the POV of those arguments against Keys, what exactly do you gain? In my experience as a developer and DBA, I have yet to see anything that's truly black and white, everything is a trade off. And I can't possibly see how the scale could tip to the side of not using keys or clustered indexes.

    What benefit is derived? If the table is soo small you don't need a key, then the storage space used by adding it will be trivial as you mentioned. But think of a lookup table based on State for instance. Doesn't the risk of having multiple entries for a state or zip code far outweigh the trivial storage differences? PARTICULARLY in lookup tables I would argue you need a key. Unless having multiple instances of the same value would ever be ok.

    And try deleting multiple fields in EM on a non-keyed table. I've had developers on the test DB create tables in EM, leave off the key, try to delete a bunch of rows, only to have EM tell them there's insufficient information to do so. Then I get an email that something's wrong with 'my' database. And each time, I have to add a column, make it the key column, delete the erroneous information, make one or more of the columns a key, and delete the temporary one. This has happened probably 15 times in my career, and the cost in my time of fixing this up or instructing a developer to do so far outweighs the cost of an Index.

    Then there's ADO.NET, have fun using a CommandBuilder on a Non-Keyed Table. Have fun configuring your DataAdapter on a non-Keyed table. Sure, ADO.NET isn't pervasive in the market yet, but once again, the 10 minutes to hours that you'll waste fixing stuff far outweighs the cost of doing it by the book the first time.

    The TEMP table argument I buy to some degree, but isn't it just easier to have a department rule, ALL TABLES ARE KEYED ALL THE TIME, and cut out any ambiguity?

    Excellent article!

  • The article is ok. But before that i want to know the Full Explanation on Cluster Index and Ordinary Indexes with examples. if u have any article please send it. My mail id is pratapr@i-vantage.com. also if possible pls send advantages and disadvantages of indexes and also about all Keys like primary, composite.

  • Excellent! I completely agree. I forwarded your article to several people I work with. I’ve recently been emphasizing and trying to make the exact same point.

  • One reason, indexes on heap tables are forced to use the physical storage location of the data. Any page splits in the data itself will cause a cascade to any and all indexes on that table. Whereas indexes on tables with a cluster, reference the cluster key value and therefore data page splits don't affect all indexes, only the cluster, and possibly indexes affected by the data change.

    Check out Ken Henderson's book on SQL Server internals for more details.

    quote:


    Ok, so you have to have a clustered index..... but why!!!


    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • quote:


    The article is ok. But before that i want to know the Full Explanation on Cluster Index and Ordinary Indexes with examples. if u have any article please send it. My mail id is pratapr@i-vantage.com. also if possible pls send advantages and disadvantages of indexes and also about all Keys like primary, composite.


    you should get yourself a copy of Inside SQL Server 2000 by Kalen Delaney.

    Covers it all.

    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • Why have a primary key on every table? Because Dr. Codd said so. I just have my developers read some Codd or Date and they don't bug me again. We require a primary key on every table (unless, perhaps, it's a one-use temporary table).

    John Scarborough
    MCDBA, MCSA

  • I did not read all the responses - so many! However, I don't think concurrency was mentioned. Placing a clustered index on an identity column places new inserts on the same page, in effect serializing them. Also, a clustered index can most dramatically improve performance only if the query affects several records. I tend to pattern the clustered index on the order by of the most common select query.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • s

  • Also, if there is any chance of a trigger being required on a table, I want an identity column. I do not know of any better way to match the before and after records in the inserted and deleted tables. I've even tested a cursor (just to see if it would work - a cursor is a means of last resort). The order of the records in the deleted and inserted tables can change.

    RandyHelpdesk: Perhaps Im not the only one that does not know what you are doing. 😉

  • Hi Andy,

    Your article gives substance and credibility to declarative statements around regularly published standards for database architecture. In many, if not most, cases I would have to agree “Always ensure each record in a table is unique 1NF and every SQL Server table should have a clustered index.”

    Here are some exceptions especially born out in a data warehousing environment:

    •Consider a large dimensional model where the fact table has 12 billion records. Adding a surrogate identity would require a biginint which is 8 bytes per record or 96 GB for the base table and another estimated 5GB for the index intermediate pages. All this for a field that will never be used, except to comply with a “standard.”

    •During bulk loads (T-SQL Bulk Insert; BCP –h”tablock”; DTS (table lock)) you can not obtain a BU lock if there is an index on the table.

    •It is perfectly reasonable to have duplicate records in a dimensional model’s fact table. For instance: on a given day the same customer bought the same item at the same register and from the same cashier. Our time granularity is by the day.

    One thing we all learn pretty fast in this business is that when applying standards it is generally a best practice to remember it depends.

    Regards,

    Joe

  • Thank you all for the comments. You're right, it does depend, there is always a case where breaking the rule is the right thing to do. My hope is that we'll teach the rules and teach that breaking them is often (but not always) a sign that you're approaching the problem from the wrong perspective.

    Andy

    http://www.sqlservercentral.com/columnists/awarren/

  • quote:


    Why have a primary key on every table? Because Dr. Codd said so. I just have my developers read some Codd or Date and they don't bug me again. We require a primary key on every table (unless, perhaps, it's a one-use temporary table).


    I think you should give your developers the definition of a primary key to read:

    quote:


    The primary key in any table is used to uniquely identify each record (row.) in that table.

    Therefore, the primary key must exist (“cannot be null”) and must of course be “unique” – “No duplicates allowed.”

    If either of these rules were broken, it would be impossible to identify any record and to positively identify the entity on any row.


    Frank

    http://www.insidesql.de

    http://www.familienzirkus.de

    --
    Frank Kalis
    Microsoft SQL Server MVP
    Webmaster: http://www.insidesql.org/blogs
    My blog: http://www.insidesql.org/blogs/frankkalis/[/url]

  • From the number of relies this article has generated I guess it's a success whether you agree with it or not.

    Sorry I haven't read the whole thread (or the article).

    My opinion

    >> The primary key in any table is used to uniquely identify each record (row.) in that table.

    i.e. the primay key is not updateable - updating a primary key is a logical delete and insert and should be implemented as such.

    Therefore any lookup tables with presentation layer content should have an artificial key so that the content can be updateable. Doesn't have to be an identity but that's just one method of generating a (possible) unique field. If it's a selection field then the description should also have a unique index.

    Also the PK is a logical concept - it doesn't have to be defined as a primary key - could be a unique index but I agree every table should have one.

    It could be implemented in other ways - triggers, SP layer and I suspect for some constraints this might not be a bad idea e.g. large table that is updated very infrequently. Never come across a situation to use this but have had situations where (from an implementational point of view) the only possible PK was an identity.

    Beyond that if you have a candidate key then it needs a unique consraint and if it is record defining why not make it the PK.

    But if you want to add another artificial key then that's up to you - it will lose information and I think cause coding to be more difficult and sometimes less efficient but you can build a system like that so go ahead if it's your preference.

    Once the constraints are added then you can look at non-unique indexes for performance.

    A common mistake is 'this query would be faster with an index so add one' - ignoring the fact that this query is run once a month and no one cares how long it takes and the index slows down critical updates.

    (One system I was on they decided to write the system without indexes then the dba group would look at all the SPs and add indexes to support them without knowing any functionality - guess what happenned).

    Clustered indexes - I consider that an implementational thing. I suspect it's possible to cause more problems with a poor clustered index than missing one when it would be useful so maybe a general rule should be to avoid them.

    Cursors never.

    DTS - only when needed and never to control.

    Edited by - nigelrivett on 11/11/2003 05:20:24 AM


    Cursors never.
    DTS - only when needed and never to control.

  • LookUp Tables are interesting - using your example (StateCode, StateName). The Statecode should have a primary key and the StateName should have not just an index but a unique index. StateCodes cannot repeat nor can the StateName repeat.

  • quote:


    i.e. the primay key is not updateable - updating a primary key is a logical delete and insert and should be implemented as such.


    A buddy of mine once went through a disastrous divorce, exacerbated by the fact that he had been assigned the worst judge in the state (different state, not Virginia), notorious for (among other things) putting the screws to men, ex-military men in particular. Long story, short version: Even though his kids were grown up and had been out of the house for years, he wound up owing about $2200/mo in alimony for, like, twenty years. This was back in 1989, when $2200 was serious money -- at the time, it was more money than I took home in one month.

    Anyhow, the reason I bring this up is, as the judge was relaying his verdict to my dumbfounded colleague, his lawyer too was stunned. "He can't do that!", the lawyer exclaimed, in a furious whisper. My friend replied, under his breath as well, "Don't tell me. Tell him!"

    Which calls up the subject of updating primary keys. In one of these threads, I once mentioned the fact that, inside a trigger, you can't always get a good read on the before- and after-picture of an updated row. If someone updates the primary key, then all bets are off when you JOIN the 'inserted' and 'deleted' pseudotables. The response then, from someone, was, "Don't worry, that's not supposed to happen."

    My response is, don't tell me, tell Microsoft. They allow the updating of primary key fields in SQL. Since they allow it, people do it. No, they shouldn't allow it, and no one should do it -- but as Black's Law Dictionary states, "should" is defined as "ought to, but not necessarily will."

    quote:


    DTS - only when needed and never to control.


    I thought I was the only person in these here parts who thinks DTS sucks.

Viewing 15 posts - 91 through 105 (of 184 total)

You must be logged in to reply to this topic. Login to reply