Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««34567»»»

Worst Practices - Not Using Primary Keys and Clustered Indexes Expand / Collapse
Author
Message
Posted Sunday, March 31, 2002 9:29 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, April 18, 2013 12:11 PM
Points: 6, Visits: 14
If you put a clustered index on the PK and the table is constantly being updated, wouldn’t this increase the possibility of record locks?




Post #22697
Posted Monday, April 1, 2002 4:49 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 2:30 PM
Points: 6,784, Visits: 1,899
Generally no. One thing you have to look at is where the inserts get done. If you're using a sequential primary key then you may end up with some contention at the end of the table. Another is if you are actually updating the primary key frequently. Really I should have left out the word primary, since these are always considerations!


Andy


Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #22698
Posted Thursday, August 1, 2002 2:54 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, October 22, 2003 12:00 AM
Points: 2, Visits: 1
Hi all,

I definitely agree that a PK is needed for EVERY table except for maybe very small temp tables (since I work in v7 I can't create tables as variables). Every time I forgot (sometimes out of laziness) I had to go back and put one in.

What I don't understand is why everyone has something against the identity column. I've used this quite a bit in my normalised databases and it works quite fast. But I use these columns for joining on other columns and not as much for filtering out certain records. I think it may be faster to join on the identity column than on a unique 10 char column that has a clustered key (I should try this some time). It is always possible to associate a code with the value of this 10 char column, but how would that be any different than the identity int? Also, if you have to put this 10 char value in a column of another table (as a f-key) then it will take up more space than the int (I just got out of bed and don't feel like doing the math here to check if the char(10) is bigger).

I already put indexes on all the f-keys and put an identity in EVERY table, also an index on the columns that are used for querying on and ordering on. The identity is not always my clustered index, but many times is. And as far as I know this has never gone wrong.

The article makes quite some sense to me… and I'm willing to try anything for performance increase without sacrificing any integrity.

My suggestion is: just test and test and test. Do all type of tests (select/insert/update/delete) and check the execution plan.

I think the article is a good start for beginners, then it is up to them to test.

Now I'm going for… coke ;)

Regards,

Michiel





Post #22699
Posted Thursday, August 1, 2002 3:34 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, November 22, 2005 2:09 AM
Points: 3, Visits: 1
Hi anyone,

I only want to add that a clustered PK has an overhead:
true: the leave-level of that index will be stored in the tablerow,
but the top of this balanced tree will need extra space in the DB

greetings,

r.warnat



Post #22700
Posted Thursday, August 1, 2002 6:29 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 18, 2013 8:45 AM
Points: 3, Visits: 5
If you're doing inserts where the value being indexed is not increasing then a clustered index is generally a BAD idea. You'll get lots of page splits. See Rob Vieira's book, "Professional SQL Server 2000 Programming" pp.283-4 and 294-7. I would like to have seen this discussed in the article. Otherwise it has good points.




Post #22701
Posted Thursday, August 1, 2002 6:34 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:02 AM
Points: 817, Visits: 2,048
Good article. I too am amazed by how many people are willing to forgo primary keys on small tables. Data integrity always preceeds performance (esp such a slight gain). Also, without this PK, you can't create the appropriate foreign keys, which is also a data integrity issue. My experience is if you don't make the relation, you end up with orphans.

One technical point: if you create a table using TSQL in Query Analyzer, the PRIMARY KEY constraint is always clustered unless a clustered index already exists or you specify NONCLUSTERED.

As far as no primary keys on temp tables, this position also has little validity. We key our temp tables, and have generated PK violations that would have gone unknown to us and produced erroneous results without them.

Tables having a clustered index, however, is less certain. I recently had the experience of seeing a significant jump in performance in some temp tables by adding NONCLUSTERED to the create command. The tables were still keyed, but the absence of a CLUSTERED index helped tremendously.

I don't recommend as a "damn good start" adding and clustering an identity column. There would almost always be a better choice for a clustered column, and these bad habits might be hard to break.

Still, I will refer to this article anytime someone tells me I might not want to key my small tables.




Post #22702
Posted Thursday, August 1, 2002 6:34 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, August 2, 2013 2:43 PM
Points: 152, Visits: 33
-- PRIMARY KEY - don't CREATE TABLE without it. Doesn't matter how temp it is. Even in-memory table variables.

-- IDENTITY - a wonderful alternative to 200+ byte PKs that aren't required to ensure data integrity. You may have a natural PK, but do you really want to use it when it's five varchar(100) fields that change? This is even more important if other tables need to reference records in the table. Always consider "lookup" tables and other "extreme relational design" techniques and super-normalized schemas -- SQL Server is tuned much better for large Star-schema queries such that performance gains from denormalization aren't as easy to acheive.

-- and CLUSTERED INDEXES: If you are using SQL Server 2K (or even 7), you have no excuse to not apply a clustered index to each and every table. Period. Understand that SQL Server handles tables with no clustered index quite differently in many respects to clustered tables. If you have more than 100 pages of data and you have significant modifications to existing rows, you will soon die under the I/O overhead. Only the rarest of circumstance could find the sole benefit of non-clustered tables (SQL Server's ability to reuse space created by deleted rows) more compelling than the many benefits of clustering a PK.

As a general rule, OLTP-type usage will demand the PK be clustered, and OLAP-type usage many benefit from another set of columns for range searches. But don't forget that you can never go wrong with a clustered PK, but you can go very wrong with some other index clustered. Also, remember that a covering index is just as effective in improving query performance as the best clustered index. (I have one table with an index that has every table field, but with a different order.)

Any body have examples where a non-clustered table is the best solution, particularly on SQL2K? I'd love to see 'em...




Post #22703
Posted Thursday, August 1, 2002 7:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, August 4, 2009 1:23 PM
Points: 1, Visits: 6
Andy,

I find using clustered indexes painful experience in the following scenarios:
Replicated tables
Large Tables w/ 15 million rows or more
With replication you can't use a clustered index on a foreign key (logical choice lots of cases) because in replication an update gets transalated to insert&delete on the backside.
Clustered indexes on large tables are painful anytime you have to do maintenance on them (adding or dropping columns).




Post #22704
Posted Thursday, August 1, 2002 8:54 AM
SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: Moderators
Last Login: Yesterday @ 2:30 PM
Points: 6,784, Visits: 1,899
Interesting points. On the replication one I'll try to find time to experiment, might be possible to override the behavior in the replication sequence.

Thanks to all for their comments. The combination of the article and your responses make for pretty solid package!

Andy
http://www.sqlservercentral.com/columnists/awarren/




Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Post #22705
Posted Thursday, August 1, 2002 9:35 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, February 22, 2005 8:31 AM
Points: 55, Visits: 1
More about clustered keys.

Andy is right about a lot of things in this article. Ironically, a few of his mistakes are not the result of being overzealous, but of giving the opposing argument too much slack!

For example, Andy says that some database designers will argue that a table doesn't need a primary key because "it's a temp table, so the rules don't really apply." He writes, "I think there is some validity to this position. [...] the most compelling reason [...] is that in many places where you use a temp table now you could (and probably should) use a variable of type table (available beginning in SQL 2000) which only allows you to define the structure, not indexes."

But you *can* specify a primary key (with a clustered index, if you like) on a table variable:

DECLARE @temp TABLE (
a INTEGER,
b NVARCHAR(2000),
PRIMARY KEY CLUSTERED(a)
)

Another argument Andy should dismiss, but doesn't, is that a particular type of table doesn't need a clustered index. He writes: "A good example of this might be a lookup table that contains state name and state abbreviation. You're going to be doing single record lookups, never a range query which is where a clustered index excels. You're also not going to have a lot of indexes, probably one on the name and one on the abbreviation where the index will provide the data, no need to do a seek through the table to find other bits of information associated with that row. For this example, is it a fair point? I think so."

Well, I don't think it matters because SQL Server will probably just keep the small table in memory. But let's look more closely at this.

Suppose you don't use a clustered index, and you then query against your "states" table for a particular state code. SQL Server has two options: (1) use the index, in which case it has to load two pages of data from the disk - a data page and an index page; or (2) don't use the index, load one page of data, but then scan the whole table to find the right row. It's a trade-off.

When you use a clustered index, the index information is built into the table data itself. So there's no trade-off. SQL Server only has to load one page from disk, and that page contains both the table data and the index data. A table with a clustered key could therefore actually perform faster.

[Plug: I verified some of this using Lumigent's Log Explorer tool. I created one table with a clustered index and one with a non-clustered index. When I insert rows into the clustered table, SQL Server records a single write in the transaction log. When I insert a row into the other table, SQL Server records two writes on different pages: one for the table itself, and one for the non-clustered index. Log Explorer let me view these records in the log. Note - I work for Lumigent, which advertises on this site.]

As a side note, the Books Online specifically recommend clustered indices for "[c]olumns that contain a limited number of unique values, such as a state column that contains only 50 unique state codes." But I can't tell if they are talking about a small "states" table or a large table that happens to have a "state" column.



Post #22706
« Prev Topic | Next Topic »

Add to briefcase «««34567»»»

Permissions Expand / Collapse