Worst Practices - Not Using Primary Keys and Clustered Indexes

  • You can create constraints on table variables, and I was under the impresion that constraints would create an index.

    declare @TableVariable table(Col1 int,

    Col2 varchar(128),

    primary key

    (

    Col1

    ),

    unique

    (

    Col2

    )

    )

    So maybe you can create some index' on table variables.

    Henry

  • I agree with what Warren is trying to put across. He/She is not talking about absolute best practices, he/she is just trying to give a starting point to the way in which pk's and clustered indexes could and may be should be used. I do think that pks should be used in all tables (ever designed a data base in MS Visio??, I suggest you try it), I don't think that using identity's is a great idea but not a bad one either, the argumnents about them and the above could never stop. Basically if your designing, developing or maintaining systems that rely on MS SQL Server (7, 2000)backend then refer back to brbolte's direct comments. Why are people still talking about, and dare I say it, SQL 6.5?? Last time I was developing on SQL 6.5 was about 4-5 years ago. Move on, for god's sake, that was 3 versions ago, and almost 4!!!. Don't be a cowboy or cowgirl and go off on your own crusades for superior database performance issues etc. Remember that there are standards to database design, and I strongly think that DATA INTEGERITY out weighs performance (65%:35%) as one of the most important factors in any database-driven solution, no matter what the size of the db. Clients don't care if can process at lightning speeds when there data is coming out corrupted, because the importance of data integrity was overlooked.

    Another point.....

    Gees, no one seems to have replied to brbolte's challenge:

    "Any body have examples where a non-clustered table is the best solution, particularly on SQL2K? I'd love to see 'em..."

    I congratulate this programmer/DBA (what ever he/she is).

    People, if your going to put up the case, then put up the evidence to prove it.

    Once again a great conversation starter by Warren, congatulations Warren.

  • Firstly, I've come across SQL6.5 a couple of times recently because it was a huge undocumented, mission critical application written in C++ where the source code had been lost. A depressingly common experience with older applications.

    Management were too afraid to risk switching the thing over and it was so convoluted that no-one truly understood what was going on. The cost of reverse-engineering and developing a replacement were prohibitive. The attitude was "if it ain't broke don't fix it".

    Secondly, one of the posts on this site referred to a dot COM that flip flopped between two servers with the active server having no indices on major tables to allow maximum possible inserts. I believe this enabled the company to get vast performance out of a couple of workstation class servers and thus they were that rare beast, the profitable dot COM.

    ========================

    He was not wholly unware of the potential lack of insignificance.

  • I am a fan of using Identity columns as  primary key and then if appropriate using the "natural" key as unique clustered index.  Since the promary key is what should be used as a foriegn key I don't like the idea of having that piece of data editable.  Also witha a composite key there are more fields in a join thus harming performance.

  • I was a journalist, I have some suggestion about your article style:

    1) Make it as simple (short) as possible, I get losted in a big article with actual little information

    2) Give WORKING CODE/Examples. Nothing is better than run a working code and see the result, both interesting and visually, this is very important for a jounior developer - WP target less/none experienced people

    3) Writes Note Style - point out what is the most important thing

     

    Thanks

     

    Charles

  • Hello Guys,

     

                I found the discussion very interesting. I think that the way you create your PK, indexes, etc. depends directly of the kind of application that you are developing, your hardware resources, types of clients that are accessing the application (web, client server, etc.), if you need to replicate your DB, etc. May be would be interesting open a new discussion about the best practices for each scenario.

     

    Regards,

     

    Julio

     

  • Great lively discussion,

    I aggree with Andy. Let us NOT forget why RDBMS were invented. I mean who wants to mess around with the file system, create their own file indexes, create rebuild index procedures, and enforce data/referential integrity themselves like in the OLD days. Bottom line, they were invented to make it easier, or abstract/hide, the daunting tasks mentioned above.

    The tools are there for a reason, USE THEM EFFECTIVELY.


    {Francisco}

  • Hi Andy

     

    Just a note that a variable of type table can have a primary key to enforce uniqueness. There is no way that I would trust a table variable that cannot enforce uniqueness. 

    Question.  Does this not create some type of index ?

    declare @tab table (col1 varchar(10) not null, col2 varchar(10) not null

      primary key(structure_code))

     

    Tania

  • Identity columns used as surrogate keys can be useful for avoiding a lot of maintenance (data change) activity.

     

  • Hi, I wish I'd read you're article before I designed the system I'm now having to support.  Sure - some of the tables were never meant to have many records, but try telling that to the users.  In initial requirements capture "There will only ever be a couple of sub-agents per agency" - now one has over 1000 - The performance hit is in reporting where complex joins are used.  The solution was to copy the sub-agency table to a temporary table - that did have an index - and use the temp table - that particular report is now 5x faster - and we don't even join on the indexed row. Another example is on the insurer table - the client only deals with 5 or 6 insurers, but has found it easier (lord knows why, but the customer is always right) to create multiple entries for the same insurer with slightly different names so now he has virtually a 1:1 ratio of agencies/insurers - plus a load of "test" insurers - with test policies - because we won't let him delete an insurer with policies against it, they just clog up his system.  My point here is Andy's idea is a good base line in design because you - as the designer - can never know how the client will just do something completely bizarre - and the first thing you know is months down the line when performance is degraded. 

  • OK, so this may have already been mentioned before, but use of clustered indexes needs to be considered a little more thoroughly than the article suggests. 

    For example, take any system where throughput is key, lets say 1000 transactions a second stuff.  If the processing involves INSERT statements then 8.5 times out of 10 performance levels are missed or degraded because of unsuitable clustered indexes.  It is VERY important to remember the physical order of the rows and the relative positions new rows will take.  You cannot hope to guarentee performance if there is a chance that a table must be re-ordered even once in that 1000 transactions. 

    This is just one more thing to have on a standard checklist determining clustered index suitability.

  • Well I disagree your point of view.

    You didn't define the scope of the applications, the dimensions of the tables and the no. of transactions per second.

    I'm in a position where we're using SQL for telephony platforms, not for groceries, and believe me, the overhead introduced by the indexes/keys/any kind of contraints slows down the insertion so much that you cannot follow the platform (the calls/SMS/MMS events).

    So please be more carefull when generalizing your afirmations. Don't give me advices on how to do it, we've tested everything that can be tested on mostly all the hardware you can imagine (i.e. HP SuperDome, RAM disks and so on).

    So my conclusion is:

    When your applications:

    1. Search atomic data not use whole bunches of data from your tables

    2. Make a lot of searches, insert at slow rates (human interaction or low numbers of rows/second)

    3. Make udates, deletes (more atomic than globals)

    Than indexing is a must.

    If you intent to:

    1. Basically append (insert) at very high rates

    2. Almost always use the whole volume of data, not atomic pieces

    3. Delete/Update the whole tables or very large volumes of rows in your table

    4. Make less searches/selects (daily reports, weekly or monthly)

    DO NOT index or make the indexes just before youre consulting the data and destroy them immediately after!

  • I think there are cases where you would not want indexes and I'm stuck on a couple choices.  So I'm throwing out a question for other perspectives. 

    If you have audit tables that are basically copies of the inserted or deleted tables with user/session information appended, what would you do for indexes?  For the most part the tables are seldom referenced.  When they are, it's hard to say what someone will be looking for - the person, the original key, the date, the application, etc.  Tables have millions of rows.  I didn't design, but in the original design, the tables had no indexes, indexes have only been added when the audit table was referenced in a particular manor enough that it was helpful for response.  These indexes are not clustered indexes when added.  So I've basically got audit tables that are heaps with sometimes optional usually nonunique (but pretty selective) indexes.  NO RI is ever used in the audit tables. 

    So what would you do for index strategy on that? 

    The original key is not unique anymore, date is the only thing currently in the record that is close to be unique, this doesn't hold true either. Would you add a uniqueidentifier column or an identity column?  What's quicker for SQL - NEWID or identity?  Cluster by old key and uniqueidentifier?  Leave a heap?  Response to inserts are important.  Ability to manage space is important (heap becomes difficult to manage because rebuilding indexes doesn't clean up).

     

  • Good morning all. I would like to ask a question that touches on this article (one comment really) and perhaps is a little off topic to the overall discussion.

    Without specifying who is who I would like some honest feedback from the group on this one:

    Between myself and one of my developers we have a fairly strong disagreement with regards to naming conventions; concerning ID columns specifically.

    One party feels that an ID column should always be named  'ID' while the other party prefers to standardize on a naming convention that specifies the nature of the ID column. The perfect example here would be a lookup table.  Lets say there is a table called (L_Status) - L = Lookup

    One party would call the ID Column ID while the other would name the column StatusID, another table might have ServiceClassID; naturally the table would be named (L_ServiceType).

    The reasoning behind this is presented as it is easier to follow and program Stored Procedures, Triggers, and the like; particularly when joins are involved as the columns are much easier to identify. That said the oposing party states that the SP can be just as easy to follow simply by looking at the logic or appending the table name to the statement (ie: L_Status.ID)

    Considering the above another argument is made that while the (L_Status.ID) method will lend some clarity it can still be a bit ambiguous when aliasing table names, not ALWAYS used in cases where there are no joins or joins with tables that do not have a 'ID' column and therefore maintaining an absolute naming convention will be touchy at best.

     

    I would like to hear the thoughts and practices of others here on this topic. I am sorry if this is in the wrong area. While I have been reading this forum for a while now this is my first posting and the 'rowid' comment reminded me to address this ever burning issue.

     

    <hr noshade size=1 width=250 color=#BBC8E5> Regards,Jeffery Williams http://www.linkedin.com/in/jwilliamsoh

  • Andy,

    Although I agree with you overal position, I absolutely disagree with the line "but a damn good starting point is to just add a identity column called rowid and make it the primary key and the clustered key".

    Where does this add to the training or knowledge of a [junior] developer/DBA. In my experience there is almost (99%) always a PK (therefore clus indx) and where there isn't then it is most certain to be a case where a table should be a heap. Instead, a developer/DBA in this position should review the design with a more senior/experienced developer/DBA and understand the issue at hand, not apply a "soggy plaster".

    As part of my standard DBA practices I monitor for tables without PK/cust-indx and seek to find out why. As you said, often it is an oversight, but I have had cases where it was justified (normally audit tables - i.e. the original PK can be duplicated in the audit table).

    Kind Regards

    Dan

Viewing 15 posts - 106 through 120 (of 184 total)

You must be logged in to reply to this topic. Login to reply