Worst Practices - Not Using Primary Keys and Clustered Indexes

  • The biggest problem I have with articles like this is that they are incomplete. You start an idea and never follow through to the end of the idea. This typical of articles like this which to me are nothing more than a gripe article or a way for someone to get their name out there who has never written an article before.

    So you say that you have a primary key on every table and really never go into an explanation why you should choose to have the cluster index on column(s) other than the primary key.

    For starters I would write in the article the reason to have a clustered index and what they do. Second I would tell the reader how to identify which columns in their table should be used for clusted indexes. Third what type of primary keys should be used (UUID vs. int). Fourth, I think you should go in depth as to what the benefits of using a primary key holds and not just one sentence explainations.

    As someone else pointed out in their comments that these types of articles aren't really for DBAs as much as they are for the people that DBAs support. If that is the case then this article really leaves a lot of information out.

  • Andy, I enjoy all of your articles and agree with you on the subject of primary keys and clustered indices. I am working with a client who has very large data volumes with a limited life span, ~ 1 week. Data volumes are in the range 100 million rows/day * 6 tables over a 6 hour timeframe. The data is bcp'd into the tables on several servers. The preparation window in very short, from close of business to opening of business next day. During the off hours, the end of day reports and file downloads are produced. Next the tables get purged of old data. Two problems right away are index maintenance - rebuilding clustered indices on these tables can't be performed within the off hours window. Gene

  • heh, check this... I work for a company where the DBA refuses to use Primary Keys.  They're apparently the tool of the devil because they make the database difficult to maintain (???).  Rather, we use unique clustered indexes in place of primary keys (and even those are somewhat sparse).  Granted, it has the same effect, but it obfuscates what the key is.  Rather than just being able to see what the key is within various reports or GUIs, you have to get the table definition.

    Foreign keys are also rarely used in our databases.  Wanna find the relationship between two tables?  You have to infer it from column names and such.

    It never ceases to amaze me when someone can take such a core concepts of database design and decide that they are somehow a bad thing that just won't work for us.  Regrettably, I find that thinking to be common in our industry.

  • Great article Andy! I came across an interesting reason to use a clustered index on every table that you didn't mention. We had a table that accumulated a LOT of data where the primary key was generated by the various clients and didn't fall in order, so adding a clustered index on it would have been disasterous for performance. In fact, none of the columns would avoid inserting rows into the middle of the table, so we just didn't add a clustered index. We had some performance problems with INSERTs and did a LOT of work on tuning it. It turned out that adding an identity column, using it as the primary key and having a clustered index on it reduced a LOT of contention and eliminated our performance issues for INSERTs. Even though there was overhead in adding a column, having SQL generate a new value for it, populate it and maintain the index, it was FASTER and reduce contentnion. We changed our original primary key column to a candidate index and we were done. Now, if I can't figure out an appropriate column or columns for a clustered index, I add an identity column and use that - always.

  • Do not agree on cluster index; do agree on identity primary key, as a starting point.

    Also it's much easier (and this argument was not included in you article) to have not-business-related primary keys, preventing business rules changing and maintenance.

  • In my experience it all comes back to design.  If at design time you get everything right, keys and indexes should be very obvious.  This really is not asteriod physics. Now the trouble in my experience is when entities are added to the schema down the road, then all hades breaks out.  Thanks for the article, will be good fooder at work.

  • "Another is that it's a temp table, so the rules don't really apply. More than the first argument, I think there is some validity to this position. Not a lot, but some! I think the most compelling reason I have in favor of this argument is that in many places where you use a temp table now you could (and probably should) use a variable of type table (available beginning in SQL 2000) which only allows you to define the structure, not indexes. Just remember this - you'll never wrong go by creating a primary key on a table, temp or not."

    One might add that the reasons to define a primary key also apply to table variables. HTH

  • Apparently the author hasn't working with large and complicated tables.  In many situations, a combination of the data model, the data distribution, and the query optimizer's plan lead to poor performance when a clustered index is used.  As the distribution and volume of data changes, the results may vary.  Rather than blindly throwing a clustered index on every table, some thought and performance monitoring is required on an on-going basis.  Sometimes there will be different queries on a given table with competing interests: a clustered index may help one query and hurt another.  Blind caveats don't work here, it takes some analysis.

  • I agree that a PK on EVERY table is an essential part of database design. As far as the clustered index is concerned, I follow the "let the ends justify the means" philosophy. If the addition of the clustered index yields performance improvements, then best practice would dictate using one. Otherwise, as Jesse has eluded to, in certain scenarios you'll be hurting performance as well as taking up vital resources.


    Cheers,

    Alex

    Rogue DBA

  • Here is where I am at with PK, FK and AK (Alternate Keys / Indexes), every table needs a PK, period. I don't care how small the table is or how small you think it will stay, it needs to have a PK. Things will change (columns could be added later) the table will grow beyond what you planed for, and you are going to have to go back and add one anyway (personaly I do not want to waste the time going back to fix something that should have been done right to begin with!), plus on a small table the overhead for a PK is minimal and it will speed up performance. I build Web Apps and download time is of the essence and any way I can make my Drop Down Lists load faster is worth it! I have a gripe with graphic designers that design images for the web with a resoultion set to 300dpi (at this point most monitors can only display upto 72dpi), but that is for another fourm.

    Again, proper DB design! I can not stress this point any more! Sit down with the people you are working for and solve, or plan for, as many problems as you are aware of or have encountered in the past, your DB and applications will be much better off.

    As far as using IDENTITY, as was posted earlier it is a tool and when used in the properly it can be very benifical (planning). I use a mix of IDENTITY and Unique ID depending on the table and how many modifications are going to be done to the table.

    It supprises me when a user finds out that you can use a string as your PK rather than using an IDENTITY (or AutoNumber for you MS Access people). All a PK is is a unique reference to a row, no one ever said it has to be a number.

    One last thing, I have "heard" people people say they don't want to use indexes because of overhead on the DB but they have a PK. FYI: PK's and FK's are indexes, as are AK's.

    Imagination is more important than knowledge.

    – Albert Einstein

  • Jeffery Williams talked about naming conventions for the primary key.  First and foremost, your company should set up a standard for naming conventions.  If that standard said the primary key was always named ID, I'd follow that standard.  (I'd mutter about the IQ level of the standards writer, but I'd follow it.)  Personally, I favor a descriptive name for the primary key with "ID" as part of the name.  If ID is used as part of the name then never make ID part of a field name that isn't a primary key or a foreign key.  (My company doesn't follow that personal rule.)

    In some of the more rigorous design systems, every field name is unique.  Each field has a specific definition, use, and description.  When you get to the physical design, the physical field name doen't have to match, but it sure is easier to read if the name stays consistent throughout the process.  A primary key field in this type of system is always specific to one table.  If you see that variable in another table, it is by definition a foreign key reference.  If you see that variable in a process flow, the data always originated from that table no matter what process feeds that information.  If you define a foreign key variable name, the description of that foreign key specifically associates it with the primary key of the table it is tied to.

    I ran into a problem with my tables.  I needed to write a script that would work no matter how many times it was executed.  The thing is, when this system was designed, foreign keys were unnamed and I needed to copy some tables, drop the table, create a new schema for the table, write the old information back into the table and recreate the foreign keys.  Worked great the first time, blew up big time the second time.  I needed to drop the table and that means I had to drop the foreign keys.  When they were re-created, they had a new name.  The second time through, the names were different, the hard-coded drop name didn't exist, the foreign keys remained and the drop of the table failed.

    I needed to find out what the names were and drop those names, since I needed to re-create these foreign keys anyway, I was going to apply a naming convention to these foreign keys.  All of them would start with 'FK_'.  Since my limit was 128 characters, the next thing to be inserted is the foreign key table name up to 64 characters long.  Next I checked the first foreign key variable name against the primary key name.  If they didn't match, I would append '_' and up to 60 characters of the foreign key variable name.  If there was room, I would append '_' and the ammount of the primary table name that would fit.

    Since all the names used aren't nearly that long, I get a nicely documented FK name.  If it looks like FK_name1_name2, I know the foreign key exactly matches the primary key and this is a normal relationship. If it looks like FK_name1_name2_name3, I know which unrelated name in the foreign table belongs with the primary table and possibly a hint at what the business relationship is as well.  You couldn't possibly do that with the primary key always being ID.

  • Andy - regardless of the validity of primary keys and clustered indexes, the "it's so easy" argument doesn't hold water. That isn't a reason for a best practice, no matter what the industry.

    There may be very good reasons for primary keys and clustered indexes every table, but "Easy" isn't one of them.

    Disappointed in CA ...

  • Another reason not mentioned in the article why every table the developers want to add to the databases in the systems I administer ........ merge replication. We are merge replicating our production servers and replication requires primary keys.

     

    I agree with Kenneth that the naming convention for primary keys is better as a descriptive name + ID. Then any foreign keys relating to it should have the same name rather than relating fields with differnet names. It just seems more logical and easier to follow.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • "It just seems more logical and easier to follow."  You do know that some people oppose this kind of idea, just for this reason.  If it's not logical and it's not easy to follow, then my job is safer.

    Yep, and that's why you aren't here anymore, and I'm stuck reading your junk.

    I was involved in looking at merge replication as well.  One of the unkeyed tables I looked at had 1 row with one field.  The one proc that changed it had:

    UPDATE tbl SET fld=fld+1

    Isn't programming wonderful!

  • I certainly agree (as does everyone on this thread) about always having primary keys.

    There are many reasons why one would not always use clustered indexes. One not mentioned so far is that they can be *very* harmful to performance on update of a large table when combined with a number of other indexes.

    The reason is that when the clustered index needs rebalancing the other indexes also need to be completely rebuilt to point to the shifted pages. On a multi-million row table this can cause the server to be unresponsive for minutes.

Viewing 15 posts - 121 through 135 (of 184 total)

You must be logged in to reply to this topic. Login to reply