Worst Practices - Not Using Primary Keys and Clustered Indexes

Question

Post reply

Worst Practices - Not Using Primary Keys and Clustered Indexes

Andy Warren

SSC Guru

Points: 119902
More actions
October 20, 2001 at 12:00 am

#104594

Comments posted to this topic are about the content posted at http://www.sqlservercentral.com/columnists/awarren/worstpracticesnotusingprimarykeysandclusteredindex.asp>http://www.sqlservercentral.com/columnists/awarren/worstpracticesnotusingprimarykeysandclusteredindex.asp
Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Viewing 15 posts - 1 through 15 (of 184 total)

You must be logged in to reply to this topic. Login to reply

ckempste SSCoach Points: 17983 More actions · Answer 1

Andy

Hi Andy, well Im not sure about this article and its validity in some areas.

My only real concern in the title of the article, I believe that is not "bad practice" NOT to use clustered indexes, in 6.5 days this may be the case re hot spots etc, but v7 onwards its not an issues. Worst still, I firmly believe its GOOD practice not to use clustered indexes, especially on identity column keys (which seem to be popular with sqlserver programmers rather than natural keys). As you can only create one clustered index over a table one should carefully plan the creation of the key to maximise performance. I beleive their is little performance gain on placing a clustered index over an identity column when a natural or other combo key will yield better performance and query optimiser hit rates. This is especially the case when the default pkey index is type clustered, which is bad practice on Microsoft's part!

What is bad practice, as you mentioned, is no p-keys.

Perhaps worth mentioning that f-keys should also be indexed. This is a common performance gain in oracle and holds true in most dbms's. Funny enough, many people forgot them and end up indexing later on some strange concatenation of columns that the optimizer cant used in joins.

Cheers

Chris

Chris Kempster
www.chriskempster.com
Author of "SQL Server Backup, Recovery & Troubleshooting"
Author of "SQL Server 2k for the Oracle DBA"

dood SSC Enthusiast Points: 116 More actions · Answer 2

you mention about overhead, i am working with SQL in the web development enviroment.

i use very small tables smallest 2 rows upto say 20 rows to allow dynamic updating of drop down/lists within the web site [this puts the power of the intranet content in the approprate users hands]. (i hope all unstander what i am doing). The use of indexing (i belive) is not so valid within this situation, due to the overhead one the system.

If i am incorrect in my assumption please can someone correct me (so i can improve my work)

amackie SSC Journeyman Points: 82 More actions · Answer 3

I would absolutely agree that every table should have a primary key.

I would strongly disagree with your recommendation that a "damn good starting point is to just add a identity column called rowid and make it the primary key and the clustered key". I would consider this a bad practice. Rather, we should examine what attributes we are storing to find the "real primary key" and alway use that. I would only suggest identity columns as a last resort, in cases where there are no unique attributes that are a candidate key - the typical example being a Customer table, where we would add an artificial CustomerID. I wouldn't go as far as saying never use identity columns, but using them as a matter of course is a bad idea. I doubt you will get any agreement on identity columns though - it seems to be one of those issues that always causes arguments.

Cheers,

Andy Mackie

skelband SSC Rookie Points: 33 More actions · Answer 4

I found this article very useful, but I think that it goes a little too far. I have seen a few comments which I agree with regarding the default use of clustered indexes which of course can be a source of performance problems when used inappropriately, but I thoroughly agree with the gist of the article.

I would have thought (other than tables with a single lookup row, typically used for global system values) any table that cannot be keyed in some form is more a sign of bad design. How can you single out a particular row of a table (say for deletion?) if you cannot uniquely identify it?

Andy Warren SSC Guru Points: 119902 More actions · Answer 5

Thanks to all for their comments so far! Let me address a couple of points:

Chris - I agree that placing the clustered index on the identity col itself won't render great gains. Remember, I didn't say that was a best practice, just a better than nothing practice. Having a clustered index anywhere in the table affects how other indexes are built/used, I believe better to have one than not. Sure it would be better used on a "perfect" key - but if they could do that, they wouldn't be in the WP category to start with! I'll have to think in your fkey argument - hard for me to disagree, but should it be a worst practice?

Dood - I addressed this to some extent in the article. Indexes arent always about speed, but data integrity (as constraints). Putting the index in place is also good planning for future growth. Aside from a primary key though, additional indexes should be used as needed only. If you KNOW it'll always be a 50 row state lookup table (or 51, or 52), then I'd agree no additional indexes required.

Andy - Hey, I hear you man - if you've got a good candidate key, use it! My goal is to get people thinking about the worst thing they can do. If all your developers build tables that always have a primarykey and always have a clustered index they've at least taken a step in the right direction. My hope would be that no table gets deployed without a DBA reviewing first - that's a great time to do some mentoring if they've created a fake pkey when a real one exists. One day we'll have to tackle the identity col issue just for fun!

Skelband - I agree a bad clustered index will cause you problems. Identity will never be one of those though. It's an "ok" plan at best, just better than nothing. For the second part, I agree - though in practice there is nothing that says you or your app can't be responsible for maintaining uniqueness rather than the db.

To everyone: The tough part about these articles is that identitying a worst practice gets you thinking about good, better, best practices. My thought was to present at least "ok" alternatives that a user could implement without a lot of work or knowledge. Is that going to do more harm than good? Leave that for the ensuing discussion? If you gave this to a new/junior programmer to read one week for professional development, would you/they be better off for having read it? Would they learn one bad habit to replace another?

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Jason Delaune SSCrazy Points: 2649 More actions · Answer 6

Hello all,

Andy, I liked your article. I think it is important to have PKs on tables in order to maintain data integrity. I also agree with the individual who said that if a "natural key" existed, then you should use it instead of an identity field. Personally, I don't like identity fields. They get me into trouble more times than not.

I also remember a point about indexes that I heard at a conference. We as DBAs and developers try to achieve, or at least should, normalization in our databases. Every time you add an index, it denormalizes your data.

I think a worst practice that should be addressed is over use of indexes. I have a developer that insists on putting an index on every column of every table in her database. On a 3 GB database, that's a lot of extra space that could be used elsewhere!

My 2 cents,

Jason

Andy Warren SSC Guru Points: 119902 More actions · Answer 7

Hey Jason,

I've heard that argument about "denormalizing" your data. It's certainly true, but I think misleading - the biggest point being that you don't have to worry about keeping the data in sync, SQL does it all for you very fast and efficiently.

I know what you mean about too many indexes, seems its always too many or too few, never just right! Do you think we expect too much? What if we told developers - index the primary key, index all foreign keys, index any column that MUST be unique - and leave the rest to us!

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

Jason Delaune SSCrazy Points: 2649 More actions · Answer 8

quote:
I know what you mean about too many indexes, seems its always too many or too few, never just right! Do you think we expect too much? What if we told developers - index the primary key, index all foreign keys, index any column that MUST be unique - and leave the rest to us!
Andy

Andy,

I think at a minimum you should index the primary keys and the foreign keys. Also, I would create an index on a unique field if it were going to be used in other relationships. Start with this and monitor the performance of the database. If you see that performance is slipping on a column that is not part of any index, then you may consider adding an index for that column. This is where SQL Profiler and the Index Tuning Wizard become a DBA's friend!

Jason

Andy Warren SSC Guru Points: 119902 More actions · Answer 9

Sure, but these articles don't target DBA's as much as the people the DBA's support. I don't expect (or want) developers spending time with the tuning wizard or profiler. As I said earlier, any discussion of worst practices leads you into a discussion of "well, what should I do" - and its just not always that simple! But it does serve as a good way to start a discussion so that YOU can help them grow.

I'm not complaining about your comments - I appreciate them!

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

LBWH4LSHH SSC Journeyman Points: 98 More actions · Answer 10

LBWH4LSHH

SSC Journeyman

Points: 98

October 23, 2001 at 7:54 pm

#419227

I agree with you.

Steve Jones - SSC Editor SSC Guru Points: 734498 More actions · Answer 11

I know, I'm late. I've been meaning to read this all week, but I kept putting it off.

OK, here it goes.

I think Andy has done a great job pointing out some things. I totally agree that something is better than nothing and if you have no clue, then make an identity field and add a clustered index. No matter what. I thought I had a reference that shows where a clustered index is better than a heap in terms of performance, but I can't find it. If I do I'll post it.

Not having a primary key is a sin. Not having an intelligent PK is not as bad, but doesn't excuse the developer. Andy's point to this article (I think ), is that this is a Worst Practice, not a recommended practice. He is showing the lowest level of something that should be done.

Steve Jones

steve@dkranch.net

Andy Warren SSC Guru Points: 119902 More actions · Answer 12

You said it better than I tried to!

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

martinvg SSC Veteran Points: 239 More actions · Answer 13

I have to react to your opinion. I can go along with the part about every table having a UNIQUE index, however, I consider it bad practice to use IDENTITY columns to define these on. If each table is in 3rd normal form or Boyd-Codd Normal form (BCNF) the unique value can be constructed from one or more columns in the data itself. I NEVER use identity columns myself. As far as clustered indexes go, I have made it a rule never to use these either. Yhe reason for this is not a theological one, as in the previous example, but a practical one.

Consider an order-processing schema. Would you make the order#, the orderdate or the customer# the clustered index? In the first two cases you will have created your own contention problems, since you wil have forced INSERT statement on the table to take place at the end of this index. Now I know MSSQLServer 7.0+ is supposed to have INSERT-ROW-Level-locking capabilities, but not every other RDBMS may have tese qualities, and after all, we're not building platform-specific solutions, are we? Leaves us with the option of using some widely-distributed value (at INSERT-time) to put the index on, like customer#. I'd go along with doing so, however in my practice over the last 10 years or so, I have found out that the gains (in performance cost) do not outweigh the efforts.

Mvg, MvG.

Andy Warren SSC Guru Points: 119902 More actions · Answer 14

Use of identity columns (or any other contrived key) certainly has it's foes - and I'll agree that each side has it's proponents. What I tried to do..and evidently failed...was to target not users like yourself who have some experience in this arena, but to point out to users who DON'T know, understand, etc...that failing to use a primarykey is horrible. Should I explain the differing views on contrived keys? While maybe not doing so left a gap in the article, my thought was (and is) that for developers/DBA's making the worst mistakes don't try to change it all in a day, try to get them moving along the path to doing things better ways. If you're not seeing worst practices, you're working in a better environment than I do!

Clustered keys now, I'll have to disagree. Platform independence is the holy grail I think, and seldom reached. Clustered keys are a MAJOR part of SQL, to not use them....I just don't see that. In your example, I'd look for a better column to use first - often a date column, but in any case one where I'd want to return the results sorted. Failing to find that, I'd probably use the primary key as the clustered key. The only time I've had issues with contention on clustered indexes was when they were frequently being updated. I'd encourage you to revisit this issue, you may find performance gains.

Thanks for your feedback, I hope you'll continue to participate and offer your opinions - as you can see from this discussion, you're not the only one who disagrees with me!

Andy

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter