Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

t-sql key on a table Expand / Collapse
Author
Message
Posted Monday, January 6, 2014 10:17 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
Christian Graus (1/6/2014)
Thanks - good to know.


Just so you don't think that I'm making this up, here's the entry from Books Online under the title of "Creating and Modifying PRIMARY KEY Constraints ". I know I hate it when someone tells me something about SQL Server and can't actually back it up.


The Database Engine automatically creates a unique index to enforce the uniqueness requirement of the PRIMARY KEY constraint. If a clustered index does not already exist on the table or a nonclustered index is not explicitly specified, a unique, clustered index is created to enforce the PRIMARY KEY constraint.



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1528339
Posted Monday, January 6, 2014 10:23 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
Ed Wagner (1/3/2014)
For this table, it is fine if you add a column as IDENTITY (1, 1) and define it as a primary key. It'll be a 4-byte key.

It appears that the combination of the two columns is unique, so the other possibility is to define a natural key using the combination of the two columns. This would be called a composite key because it contains more than one column. There is great debate between using artificial keys and natural keys and I'm not trying to ignite a debate. Ami Levin has a chapter on the debate in "SQL Server Deep Dives Volume 2" and it covers both sides to the argument.

What matters in the decision here is the overall design of the table, the uniqueness of the columns and if the values are ever-increasing. If you add more columns to this table and add nonclustered indexes to them, all nonclustered indexes inherit the values of the clustered index, which increases the size, so you'd want the size of the clustered index to be as narrow as possible. If you add an artificial primary key and the two columns are unique, you should also consider adding a unique constraint to the two columns.

I know I've thrown out a lot of theory here, but as with so very many things, "it depends" strikes again.


All good points. The only thing that you've left out of the "It Depends" scenario is how the table will be used. See my "It Depends" entry a couple of posts up.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1528340
Posted Monday, January 6, 2014 10:25 PM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 4:08 PM
Points: 1,963, Visits: 2,898
Actually, it is a terrible idea to add an identity to this table.

Depending on your specific data, future inserts could be sequential anyway. Even if not, rows are inserted once and typically read 10s, or even 100s, 1000s or more times, particularly for lookup tables, which this certainly seems to be. Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table (assuming Enterprise or other qualifying edition).

Overall, the biggest performance gain you'll get is to stop using dopey identity clusters by default and instead consider carefully how the table is accessed. I've removed over 6000 indexes and improved SELECT performance dramatically, particularly for joins, the majority of it from just replacing identity clusters with properly-keyed clustered indexes.



SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1528341
Posted Monday, January 6, 2014 10:53 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
ScottPletcher (1/6/2014)
Actually, it is a terrible idea to add an identity to this table.

Depending on your specific data, future inserts could be sequential anyway. Even if not, rows are inserted once and typically read 10s, or even 100s, 1000s or more times, particularly for lookup tables, which this certainly seems to be. Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table (assuming Enterprise or other qualifying edition).

Overall, the biggest performance gain you'll get is to stop using dopey identity clusters by default and instead consider carefully how the table is accessed. I've removed over 6000 indexes and improved SELECT performance dramatically, particularly for joins, the majority of it from just replacing identity clusters with properly-keyed clustered indexes.



First of all, stop using words like "dopey". There are some very good uses of IDENTITY columns that I use and your comments infer that I'm dopey. Unless you really like pork chops, please keep such comments to yourself. They don't become you as a professional, they add nothing to the conversation, and they can easily be taken as attempts to belittle or as ad hominem attacks.

It certainly is possible that "future inserts could be sequential". If that's true, then that fits one of the commonly accepted best practices of making a Clustered Index "ever increasing" and you would, indeed, not need an identity column for this table.

If, however, they are not entered sequentially, then it's not so much the performance of SELECTs that I'd be concerned with. Rather it would be the performance of the INSERTS. If there are a lot of INSERTs to this table by a lot of concurrent users, there will be massive page splits that can and will cause timeouts at the GUI.

If the table is mostly static and is of manageable size, then I agree, it almost doesn't matter if an occasional page split occurs. That brings up another point. You have to also plan on maintenance requirements and space. If this is a large table, then massive page splits can easily cause the table to double in size due to the space wasted by the page splits and, contrary to popular belief, disk space isn't cheap. Yeah... you could REORGANIZE the Clustered Index on a regular basis, but even though you can do that in an online fashion, it's expensive CPU and IO wise. The other thing is that REORGANIZE is FULLY LOGGED no matter what recovery mode you’re in and REORGANIZE does NOT rebuild the B-TREE.

If you decide to REBUILD a Clustered Index, you might be able to get away with the WITH DROP EXISTING optimization to keep your MDF file from growing. If you can't, then remember that any index over 128 extents (just 8MB), will have its page usage preserved until the new copy of the index has been created. If that's the Clustered Index of a 1TB table, then you better have a free TB+ hanging around to REBUILD it.

The only good part about REBUILDing indexes is that they will be minimally logged in the BULK-LOGGED and SIMPLE recovery modes. Just keep in mind that if a minimally logged operation occurs while in the BULK-LOGGED mode, any log backups made during that timeframe cannot be partially used for true Point-in-Time recovery. You either have to use the whole log backup or you have to stop at a point before it.

You ALSO have to consider the point (prior to 2014) that if a table had ANY blob columns in it, then you can NOT rebuild the Clustered index online. If the table has a blob in it, that automatically means that the Clustered Index cannot be rebuilt online.

So, lots to consider. You can't just say, "Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table" because you might not actually be able to either because of blobs or because you don't actually have the disk space.

The overall problem with this thread, so far, is that no one but the OP actually knows HOW the table will be used and he hasn't said yet.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1528344
Posted Tuesday, January 7, 2014 8:08 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 4:08 PM
Points: 1,963, Visits: 2,898
Jeff Moden (1/6/2014)
ScottPletcher (1/6/2014)
Actually, it is a terrible idea to add an identity to this table.

Depending on your specific data, future inserts could be sequential anyway. Even if not, rows are inserted once and typically read 10s, or even 100s, 1000s or more times, particularly for lookup tables, which this certainly seems to be. Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table (assuming Enterprise or other qualifying edition).

Overall, the biggest performance gain you'll get is to stop using dopey identity clusters by default and instead consider carefully how the table is accessed. I've removed over 6000 indexes and improved SELECT performance dramatically, particularly for joins, the majority of it from just replacing identity clusters with properly-keyed clustered indexes.



First of all, stop using words like "dopey". There are some very good uses of IDENTITY columns that I use and your comments infer that I'm dopey. Unless you really like pork chops, please keep such comments to yourself. They don't become you as a professional, they add nothing to the conversation, and they can easily be taken as attempts to belittle or as ad hominem attacks.

It certainly is possible that "future inserts could be sequential". If that's true, then that fits one of the commonly accepted best practices of making a Clustered Index "ever increasing" and you would, indeed, not need an identity column for this table.

If, however, they are not entered sequentially, then it's not so much the performance of SELECTs that I'd be concerned with. Rather it would be the performance of the INSERTS. If there are a lot of INSERTs to this table by a lot of concurrent users, there will be massive page splits that can and will cause timeouts at the GUI.

If the table is mostly static and is of manageable size, then I agree, it almost doesn't matter if an occasional page split occurs. That brings up another point. You have to also plan on maintenance requirements and space. If this is a large table, then massive page splits can easily cause the table to double in size due to the space wasted by the page splits and, contrary to popular belief, disk space isn't cheap. Yeah... you could REORGANIZE the Clustered Index on a regular basis, but even though you can do that in an online fashion, it's expensive CPU and IO wise. The other thing is that REORGANIZE is FULLY LOGGED no matter what recovery mode you’re in and REORGANIZE does NOT rebuild the B-TREE.

If you decide to REBUILD a Clustered Index, you might be able to get away with the WITH DROP EXISTING optimization to keep your MDF file from growing. If you can't, then remember that any index over 128 extents (just 8MB), will have its page usage preserved until the new copy of the index has been created. If that's the Clustered Index of a 1TB table, then you better have a free TB+ hanging around to REBUILD it.

The only good part about REBUILDing indexes is that they will be minimally logged in the BULK-LOGGED and SIMPLE recovery modes. Just keep in mind that if a minimally logged operation occurs while in the BULK-LOGGED mode, any log backups made during that timeframe cannot be partially used for true Point-in-Time recovery. You either have to use the whole log backup or you have to stop at a point before it.

You ALSO have to consider the point (prior to 2014) that if a table had ANY blob columns in it, then you can NOT rebuild the Clustered index online. If the table has a blob in it, that automatically means that the Clustered Index cannot be rebuilt online.

So, lots to consider. You can't just say, "Besides, tables can of course be rebuilt when truly needed, and rebuilt online for this table" because you might not actually be able to either because of blobs or because you don't actually have the disk space.

The overall problem with this thread, so far, is that no one but the OP actually knows HOW the table will be used and he hasn't said yet.



Even if some splits do occur, the table won't "easily double in size", since that would require every page to split.

Automatically adding an identity as the clustering key by default simply is dopey. The clustered index is the most important index on any table, and so it should always be considered carefully.


SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1528503
Posted Tuesday, January 7, 2014 2:20 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:14 AM
Points: 36,735, Visits: 31,186
ScottPletcher (1/7/2014)
Even if some splits do occur, the table won't "easily double in size", since that would require every page to split.

Automatically adding an identity as the clustering key by default simply is dopey. The clustered index is the most important index on any table, and so it should always be considered carefully.


You've apparently not experienced the type of massive splitting that I've seen happen. And yes, the clustered index is the most important index on any table and "it should always be considered carefully". That also means not summarily dismissing things like IDENTITY columns. That would truly be a "dopey" thing to do.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1528672
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse