Worst Practices - Not Using Primary Keys and Clustered Indexes

  • Ok, so you have to have a clustered index..... but why!!!

  • In over simplified terms, clustered indexes are both more effective for range searches and they increase the effective of your other indexes to some degree. Look for an article in the next few weeks.

    Andy

  • I am not a DBA myself but a developer who has spent a lot of time studying, database

    design.

    Not all Companies have a DBA, my experience is that most SME's just allow developers to develop databases and if it works that is fine.

    From adeveloper point of view I would totally agree with the points made about Natural keys as I am constantly tearing my hair out at the databases I come accross that have an identity as the key when a natural key exists. The worst practice I have found is placing an identity column on a many 2 many resolution table and making it the PK rather than having a compound key.

    I have also found that the use of identity keys can cause no end of problems when you wish to perform a simple task such as importing data from another database of the same design. I have found myself running queries that add 200 to every key in the app having already removed all constraints and identity settings and then importing the data with identity override on.

    I am very pleased to see articles tat are aimed at developers as well as DBA's

    but would take issue with encouraging the use of identity fields.


    Mr Peter Livesey

  • Identity cols are often the subject of relentless debate. To me its like any other tool - use it properly and its great, abuse it and it hurts. I use idents a lot, though in recent months I've begun converting to uniqueidentifers in a lot of cases and have been pleased with the results - the biggest difference being I can generate a unique on the client - amazing how much simpler life becomes.

    Andy

  • I like your article and believe that every table should have a primary key and that it won't hurt every table if it has a clustered index. Almost all our tables have primary keys (although some of that was forced upon us since you can't set up transactional replication on a table without a primary key.) Many of our tables have clustered indexes and we keep adding them to tables that don't have them. They help a lot to speed up slow running queries.

    We use identity columns a lot as primary keys and they seem to work fine. One of the people I learned from said that joins are faster using int columns then with varchar and since we do a lot of joins on tables with a one to many relationship this seems to be fine.

    Robert Marda

    Robert W. Marda
    Billing and OSS Specialist - SQL Programmer
    MCL Systems

  • Identities are not good or bad in and of themselves. they do solve problems and in my experience, they can ease loads because I get data that does not have a natural key. If I had to generate one, I would be processing row by row. This also helps with copying rows in the same table. The identity handles the uniqueness for the pk.

    Not to say that identities should be everywhere. I have started using the GUID and like it as well. I wouldn't say they are appropriate everywhere, but they are useful.

    Steve Jones

    steve@dkranch.net

  • <quote>

    What if we told developers - index the primary key, index all foreign keys, index any column that MUST be unique - and leave the rest to us!

    </quote>

    I would say that "index any column that MUST be unique" is a good example of WP.

    I would say that "set a UNIQUE constraint on any column that MUST be unique" is a good example of BP.

    /m

  • Maybe that's a more correct way of putting it, but does it really make a difference? Sometimes it IS helpful to explain things the right way, sometimes the result is all that matters. Not sure myself on this one yet. Good suggestion though!

    Andy

  • I am still not sure I will agree with you on a every table should have a Primary Key or index (clustered or non). Small tables that take up no more than 8k or so which are used for contant values and do not change often if at all. The reason for this is the way indexes look at these or are stored.

    First NonClustered indexes like tables live on 8k pages as well (they could potentially live on the same 8k page if room allows), but they add space needs and the movement through them will not improve in most cases as opposed to a table scan on such small tables.

    Next a primary key no matter how you look at it will have an index (if nothing more than just to ensure unique) and if not clustered is going to bite into the space needs problem as well as and we go bac to why not to have a nonclustered index. So if you have a primary key on such a small table it should be clustered.

    Then we have the clustered index which will be the most bennificial thing you can add in most all cases. However, when you create a clustered index even though your data is now ordered you will still have some space created for the B-Tree pointers. Since these are helpfull in finding the datapages when the are not in order in the database they do help with tables on more than 1 8k page. However when a query is performed on an 8k page (or even a 2 or 3 8k pages) you really loss some performance. Here is why, when the query sees a clustered index and decides to use it will go first to the b-tree and get the pointers, so with an 8k or less table it has only one pointer. Then it goes to the begining of the table and checks the start range to see if the value is greater if so it check the end range and determines if less for possible existance. Now if the range is fine it will, due to the fact the table is so small read from the start range to the end range seeking the value, notice if all are on the same 8k page you will have read the whole table until you reach you data. Now without the index you can keep you data in order and accomplish the access faster as first the query will get the FirstIAM pointer to the first column and just start reading until it hits the value it is looking for, so now we have skipped the leaf check that is performed when using a cluster index thus saving 2 steps. In this situation the cluster will only bennifit if the data requested does not occurr within the range of the page, otherwise the table scan will access it quicker and more efficiently. Consider if this table is a value table for another table and you join them often for reporting and other reason to show the value and not the identifying value which is generally an int column (not neccesarily identity column) then you want to make sure you access your data as fast as possible to keep the system ready for the next person.

    So in my studies and experimentation it is generally not to have a primary key (due to required index) or any index on any table that is never going to be above 8k (thus fitting on 1 page) and will be mostly static for it's life. If you are carefull with the way the data is entered to start with then you can augment the table scan to outperform the index seek on that size table (In other words the data should be entered in proper sort order which if not you can add a clustered index then remove to force this when you make changes). Any other situation I do agree with your overall statement.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • I'd say that's a fair point. But how much does it really cost? One thing I teach is to build as much data integrity into the table as you can. Just a little thing like not having a pkey or unique index can really cause you problems when someone enters a duplicate value!

    Andy

  • I agree, that is why I stated static tables (usually configuration or set parameteres that noone shoul have access to directly edit other than an admin), but you could without adding as much overhead as the index and key put a trigger in place to prevent duplicates. And I do believe you are better teaching them to do it as you suggest in beginning but should never limit it to that belief that one ways is right.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • Agree that absolutes are bad. Some are worse than others though.

    Andy

  • I have a client who is on sql 6.5. I am going to upgrade them to 2000. I

    have been glancing over their current design and noticed that none of their

    tables have primary keys. I have got a little db design experience (not tons

    of it) and of course my automatic response is to be opposed to this.

    However, on further examination I realized that all of these tables have got

    a composite clustered unique index on them. Plus a few regular indexes as

    well. So I had to ask myself this question: If a table has got a composite

    clustered unique index on it, why would you need a PK? I should note here

    that they access all of their data through views. There are even views that

    access and join other views. (FYI, I didnt design or write any of this) But

    back to the question. Would there be any benifit to a PK over a composite

    clustered unique index for this scenario? Was there any benifit to the index

    in 6.5. I am going to hit up the bosses to completely redesign this whole

    thing and that is going to be one of my arguements if possible. I need more

    than "its good practice" or "thats the correct way to do it". All responses

    are welcomed and encouraged.

    P.S. Is there any neato thing that Im unaware of for the views accessing and

    joining other views thing.

    Thanks in advance.

    Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/awarren/worstpracticesnotusingprimarykeysandclusteredindex.asp

    [/quote]

  • hirepwrd as for having a PK instead of unique clustered index alone the only extra bennifit you get is that only a PK can be the FK constraint for another table. Pretty much the indexes as they are are just fine unless you try to build relationships using columns from that table.

    "Don't roll your eyes at me. I will tape them in place." (Teacher on Boston Public)

  • Its been interesting reading the replies. One issue I haven't seen addressed anywhere is the fact that in SQL Server the data is stored as a heap structure. Sure you can rebuild indexes to get rid of fragmentation but how do you do that to a base table? Well the two ways I know of are 1> dump the data out to a file, drop/truncate the table, reload the data OR 2> have a clustered index and rebuild it. I'd much rather do 2 for convenience.

Viewing 15 posts - 16 through 30 (of 184 total)

You must be logged in to reply to this topic. Login to reply