Worst Practices - Not Using Primary Keys and Clustered Indexes

  • Andy

    Hi Andy, well Im not sure about this article and its validity in some areas.

    My only real concern in the title of the article, I believe that is not "bad practice" NOT to use clustered indexes, in 6.5 days this may be the case re hot spots etc, but v7 onwards its not an issues. Worst still, I firmly believe its GOOD practice not to use clustered indexes, especially on identity column keys (which seem to be popular with sqlserver programmers rather than natural keys). As you can only create one clustered index over a table one should carefully plan the creation of the key to maximise performance. I beleive their is little performance gain on placing a clustered index over an identity column when a natural or other combo key will yield better performance and query optimiser hit rates. This is especially the case when the default pkey index is type clustered, which is bad practice on Microsoft's part!

    What is bad practice, as you mentioned, is no p-keys.

    Perhaps worth mentioning that f-keys should also be indexed. This is a common performance gain in oracle and holds true in most dbms's. Funny enough, many people forgot them and end up indexing later on some strange concatenation of columns that the optimizer cant used in joins.

    Cheers

    Chris


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • you mention about overhead, i am working with SQL in the web development enviroment.

    i use very small tables smallest 2 rows upto say 20 rows to allow dynamic updating of drop down/lists within the web site [this puts the power of the intranet content in the approprate users hands]. (i hope all unstander what i am doing). The use of indexing (i belive) is not so valid within this situation, due to the overhead one the system.

    If i am incorrect in my assumption please can someone correct me (so i can improve my work)

  • I would absolutely agree that every table should have a primary key.

    I would strongly disagree with your recommendation that a "damn good starting point is to just add a identity column called rowid and make it the primary key and the clustered key". I would consider this a bad practice. Rather, we should examine what attributes we are storing to find the "real primary key" and alway use that. I would only suggest identity columns as a last resort, in cases where there are no unique attributes that are a candidate key - the typical example being a Customer table, where we would add an artificial CustomerID. I wouldn't go as far as saying never use identity columns, but using them as a matter of course is a bad idea. I doubt you will get any agreement on identity columns though - it seems to be one of those issues that always causes arguments.

    Cheers,

    Andy Mackie

  • I found this article very useful, but I think that it goes a little too far. I have seen a few comments which I agree with regarding the default use of clustered indexes which of course can be a source of performance problems when used inappropriately, but I thoroughly agree with the gist of the article.

    I would have thought (other than tables with a single lookup row, typically used for global system values) any table that cannot be keyed in some form is more a sign of bad design. How can you single out a particular row of a table (say for deletion?) if you cannot uniquely identify it?

  • Thanks to all for their comments so far! Let me address a couple of points:

    Chris - I agree that placing the clustered index on the identity col itself won't render great gains. Remember, I didn't say that was a best practice, just a better than nothing practice. Having a clustered index anywhere in the table affects how other indexes are built/used, I believe better to have one than not. Sure it would be better used on a "perfect" key - but if they could do that, they wouldn't be in the WP category to start with! I'll have to think in your fkey argument - hard for me to disagree, but should it be a worst practice?

    Dood - I addressed this to some extent in the article. Indexes arent always about speed, but data integrity (as constraints). Putting the index in place is also good planning for future growth. Aside from a primary key though, additional indexes should be used as needed only. If you KNOW it'll always be a 50 row state lookup table (or 51, or 52), then I'd agree no additional indexes required.

    Andy - Hey, I hear you man - if you've got a good candidate key, use it! My goal is to get people thinking about the worst thing they can do. If all your developers build tables that always have a primarykey and always have a clustered index they've at least taken a step in the right direction. My hope would be that no table gets deployed without a DBA reviewing first - that's a great time to do some mentoring if they've created a fake pkey when a real one exists. One day we'll have to tackle the identity col issue just for fun!

    Skelband - I agree a bad clustered index will cause you problems. Identity will never be one of those though. It's an "ok" plan at best, just better than nothing. For the second part, I agree - though in practice there is nothing that says you or your app can't be responsible for maintaining uniqueness rather than the db.

    To everyone: The tough part about these articles is that identitying a worst practice gets you thinking about good, better, best practices. My thought was to present at least "ok" alternatives that a user could implement without a lot of work or knowledge. Is that going to do more harm than good? Leave that for the ensuing discussion? If you gave this to a new/junior programmer to read one week for professional development, would you/they be better off for having read it? Would they learn one bad habit to replace another?

    Andy

  • Hello all,

    Andy, I liked your article. I think it is important to have PKs on tables in order to maintain data integrity. I also agree with the individual who said that if a "natural key" existed, then you should use it instead of an identity field. Personally, I don't like identity fields. They get me into trouble more times than not.

    I also remember a point about indexes that I heard at a conference. We as DBAs and developers try to achieve, or at least should, normalization in our databases. Every time you add an index, it denormalizes your data.

    I think a worst practice that should be addressed is over use of indexes. I have a developer that insists on putting an index on every column of every table in her database. On a 3 GB database, that's a lot of extra space that could be used elsewhere!

    My 2 cents,

    Jason

  • Hey Jason,

    I've heard that argument about "denormalizing" your data. It's certainly true, but I think misleading - the biggest point being that you don't have to worry about keeping the data in sync, SQL does it all for you very fast and efficiently.

    I know what you mean about too many indexes, seems its always too many or too few, never just right! Do you think we expect too much? What if we told developers - index the primary key, index all foreign keys, index any column that MUST be unique - and leave the rest to us!

    Andy

  • quote:


    I know what you mean about too many indexes, seems its always too many or too few, never just right! Do you think we expect too much? What if we told developers - index the primary key, index all foreign keys, index any column that MUST be unique - and leave the rest to us!

    Andy


    Andy,

    I think at a minimum you should index the primary keys and the foreign keys. Also, I would create an index on a unique field if it were going to be used in other relationships. Start with this and monitor the performance of the database. If you see that performance is slipping on a column that is not part of any index, then you may consider adding an index for that column. This is where SQL Profiler and the Index Tuning Wizard become a DBA's friend!

    Jason

  • Sure, but these articles don't target DBA's as much as the people the DBA's support. I don't expect (or want) developers spending time with the tuning wizard or profiler. As I said earlier, any discussion of worst practices leads you into a discussion of "well, what should I do" - and its just not always that simple! But it does serve as a good way to start a discussion so that YOU can help them grow.

    I'm not complaining about your comments - I appreciate them!

    Andy

  • I agree with you.

  • I know, I'm late. I've been meaning to read this all week, but I kept putting it off.

    OK, here it goes.

    I think Andy has done a great job pointing out some things. I totally agree that something is better than nothing and if you have no clue, then make an identity field and add a clustered index. No matter what. I thought I had a reference that shows where a clustered index is better than a heap in terms of performance, but I can't find it. If I do I'll post it.

    Not having a primary key is a sin. Not having an intelligent PK is not as bad, but doesn't excuse the developer. Andy's point to this article (I think ), is that this is a Worst Practice, not a recommended practice. He is showing the lowest level of something that should be done.

    Steve Jones

    steve@dkranch.net

  • You said it better than I tried to!

    Andy

  • I have to react to your opinion. I can go along with the part about every table having a UNIQUE index, however, I consider it bad practice to use IDENTITY columns to define these on. If each table is in 3rd normal form or Boyd-Codd Normal form (BCNF) the unique value can be constructed from one or more columns in the data itself. I NEVER use identity columns myself. As far as clustered indexes go, I have made it a rule never to use these either. Yhe reason for this is not a theological one, as in the previous example, but a practical one.

    Consider an order-processing schema. Would you make the order#, the orderdate or the customer# the clustered index? In the first two cases you will have created your own contention problems, since you wil have forced INSERT statement on the table to take place at the end of this index. Now I know MSSQLServer 7.0+ is supposed to have INSERT-ROW-Level-locking capabilities, but not every other RDBMS may have tese qualities, and after all, we're not building platform-specific solutions, are we? Leaves us with the option of using some widely-distributed value (at INSERT-time) to put the index on, like customer#. I'd go along with doing so, however in my practice over the last 10 years or so, I have found out that the gains (in performance cost) do not outweigh the efforts.

    Mvg, MvG.


    Mvg, MvG.

  • Use of identity columns (or any other contrived key) certainly has it's foes - and I'll agree that each side has it's proponents. What I tried to do..and evidently failed...was to target not users like yourself who have some experience in this arena, but to point out to users who DON'T know, understand, etc...that failing to use a primarykey is horrible. Should I explain the differing views on contrived keys? While maybe not doing so left a gap in the article, my thought was (and is) that for developers/DBA's making the worst mistakes don't try to change it all in a day, try to get them moving along the path to doing things better ways. If you're not seeing worst practices, you're working in a better environment than I do!

    Clustered keys now, I'll have to disagree. Platform independence is the holy grail I think, and seldom reached. Clustered keys are a MAJOR part of SQL, to not use them....I just don't see that. In your example, I'd look for a better column to use first - often a date column, but in any case one where I'd want to return the results sorted. Failing to find that, I'd probably use the primary key as the clustered key. The only time I've had issues with contention on clustered indexes was when they were frequently being updated. I'd encourage you to revisit this issue, you may find performance gains.

    Thanks for your feedback, I hope you'll continue to participate and offer your opinions - as you can see from this discussion, you're not the only one who disagrees with me!

    Andy

Viewing 15 posts - 1 through 15 (of 184 total)

You must be logged in to reply to this topic. Login to reply