primary key discussion

  • ScottPletcher (8/29/2014)


    patrickmcginnis59 10839 (8/29/2014)


    CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Colums with the IDENTITY property can be per table, and you can have the same value for columns with the identity property in separate tables, so at best it would be the count of insertion attempts that were made to that TABLE.

    If you want to be really technical, it's not even that. Some identity values can be discarded without a corresponding INSERT attempt, so it's not a true "count" of those. It's just an arbitrary identifier ... which is ok, because that's all that's needed!

    Now, it's far too often used as a clustering key when it should not be, but it can still be useful as a pk or other unique identifier of a table row.

    Just a quick question Scott, in your opinion which is better, a) using an identity column for the clustered index, guaranteeing an ever increasing order of values or b) unique numerical value which order of inserts cannot be guaranteed?

    😎

  • Eirikur Eiriksson (8/29/2014)


    ScottPletcher (8/29/2014)


    patrickmcginnis59 10839 (8/29/2014)


    CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Colums with the IDENTITY property can be per table, and you can have the same value for columns with the identity property in separate tables, so at best it would be the count of insertion attempts that were made to that TABLE.

    If you want to be really technical, it's not even that. Some identity values can be discarded without a corresponding INSERT attempt, so it's not a true "count" of those. It's just an arbitrary identifier ... which is ok, because that's all that's needed!

    Now, it's far too often used as a clustering key when it should not be, but it can still be useful as a pk or other unique identifier of a table row.

    Just a quick question Scott, in your opinion which is better, a) using an identity column for the clustered index, guaranteeing an ever increasing order of values or b) unique numerical value which order of inserts cannot be guaranteed?

    😎

    It depends. How is the table queried? Is it a (very) large table? Is it joined to other (very) large table(s)? How are those other tables keyed? In some cases, it's worth adding a little freespace and/or having some small degree of fragmentation to make merge joins viable for (very) large tables (fragmentation is often vastly overestimated anyway: yes, there are multiple points of insert, but each of those is sequential as well).

    If I key by ident, do I then have to create a gazillion covering indexes? That often causes far more overall I/O -- and wasted disk -- than just making the clustered index the main entry point for (almost) all queries.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (8/29/2014)


    Eirikur Eiriksson (8/29/2014)


    ScottPletcher (8/29/2014)


    patrickmcginnis59 10839 (8/29/2014)


    CELKO (8/22/2014)


    What do you use for the PK on lookup tables?

    I use the encoding that is being used. The IDENTITY property (not a column!) is the count of insertion attempts that was made to that disk on that one machine.

    Colums with the IDENTITY property can be per table, and you can have the same value for columns with the identity property in separate tables, so at best it would be the count of insertion attempts that were made to that TABLE.

    If you want to be really technical, it's not even that. Some identity values can be discarded without a corresponding INSERT attempt, so it's not a true "count" of those. It's just an arbitrary identifier ... which is ok, because that's all that's needed!

    Now, it's far too often used as a clustering key when it should not be, but it can still be useful as a pk or other unique identifier of a table row.

    Just a quick question Scott, in your opinion which is better, a) using an identity column for the clustered index, guaranteeing an ever increasing order of values or b) unique numerical value which order of inserts cannot be guaranteed?

    😎

    It depends. How is the table queried? Is it a (very) large table? Is it joined to other (very) large table(s)? How are those other tables keyed? In some cases, it's worth adding a little freespace and/or having some small degree of fragmentation to make merge joins viable for (very) large tables (fragmentation is often vastly overestimated anyway: yes, there are multiple points of insert, but each of those is sequential as well).

    If I key by ident, do I then have to create a gazillion covering indexes? That often causes far more overall I/O -- and wasted disk -- than just making the clustered index the main entry point for (almost) all queries.

    It also depends on the environment your database is running in. If you are using database mirroring across a slow WAN pipe, you may want to use an identity column as the clustered index to reduce fragmentation highly active tables.

  • If queries which scan a range of primary key are used very often, you may want to cluster on the primary key; where the primary key is multi-column, that includes the case where queries that pick a single value of some of the columns are common (and which such queries are common may tell you what order the columns should have within the primary key). If queries that come from outside the database normally use the primary key, you should not use a surrogate as primary key but that doesn't necessarily mean you don't want to cluster on the surrogate instead of on the primary key - or indeed want to cluster on something else altogether. It is somethimes reasonable to write joins using the surrogate but still specify the natural key as the primary key and cluster on something that is neither the primary key nor the surrogate. You may find that where a table is not a gateway to the database the besy choice for primary key something which is a surrogate derived for use in joins in a parent table which uses the natural primary key - and you may find this even when the parent table clusters on the natural key if for example the child table has far fewer rows than the parent.

    It all depends on what the query workload and the row counts look like.

    I think Grant's advice in his comment above is spot on.

    Tom

  • The page split / fragmentation concern is often overblown. Remember, one INSERT, but you may read the row 1000, 10K, 100K+(?) or more times. Particularly given the availability of partitions, reorganizations, online rebuilds, etc., some fragmentation can be dealt with far better than using the wrong clus key just to reduce frag on the single INSERT of each row while drastically harming the performance of the vast majority of future SELECTs.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (8/29/2014)


    The page split / fragmentation concern is often overblown. Remember, one INSERT, but you may read the row 1000, 10K, 100K+(?) or more times. Particularly given the availability of partitions, reorganizations, online rebuilds, etc., some fragmentation can be dealt with far better than using the wrong clus key just to reduce frag on the single INSERT of each row while drastically harming the performance of the vast majority of future SELECTs.

    Until you find yourself in that position. I haven't but I have talked with someone who was in that exact environment and defragging a highly fragmented clustered index would basically shutdown the mirroring to the DR site which is over a slow WAN connection.

    You have to work with what you have and make the systems work under less than optimum conditions.

  • Lynn Pettis (8/29/2014)


    ScottPletcher (8/29/2014)


    The page split / fragmentation concern is often overblown. Remember, one INSERT, but you may read the row 1000, 10K, 100K+(?) or more times. Particularly given the availability of partitions, reorganizations, online rebuilds, etc., some fragmentation can be dealt with far better than using the wrong clus key just to reduce frag on the single INSERT of each row while drastically harming the performance of the vast majority of future SELECTs.

    Until you find yourself in that position. I haven't but I have talked with someone who was in that exact environment and defragging a highly fragmented clustered index would basically shutdown the mirroring to the DR site which is over a slow WAN connection.

    You have to work with what you have and make the systems work under less than optimum conditions.

    How did it get "highly" fragmented? For a very large table, that would take a very large number of "bad" (mid-page) page splits, which would not occur with a reasonable clustering key, even if it wasn't universally ascending.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Thank you Scott, Tom and Lynn, this was the discussion I wanted to get going in this context. The subject is far to dependent on the environment and the nature of the dominant activities for one to generalise.

    😎

  • Eirikur Eiriksson (8/29/2014)


    Thank you Scott, Tom and Lynn, this was the discussion I wanted to get going in this context. The subject is far to dependent on the environment and the nature of the dominant activities for one to generalise.

    😎

    Exactly correct! Don't let some nursery-rhyme like saying -- "narrow, unique, ever-ascending" -- bulldoze you into a cluster key selection: do some real analysis, and decide on the best key for your specific situation. The clus key is the single most important factor for performance (in most cases), so don't assume one simple general rule fits your exact situation.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (8/29/2014)


    Lynn Pettis (8/29/2014)


    ScottPletcher (8/29/2014)


    The page split / fragmentation concern is often overblown. Remember, one INSERT, but you may read the row 1000, 10K, 100K+(?) or more times. Particularly given the availability of partitions, reorganizations, online rebuilds, etc., some fragmentation can be dealt with far better than using the wrong clus key just to reduce frag on the single INSERT of each row while drastically harming the performance of the vast majority of future SELECTs.

    Until you find yourself in that position. I haven't but I have talked with someone who was in that exact environment and defragging a highly fragmented clustered index would basically shutdown the mirroring to the DR site which is over a slow WAN connection.

    You have to work with what you have and make the systems work under less than optimum conditions.

    How did it get "highly" fragmented? For a very large table, that would take a very large number of "bad" (mid-page) page splits, which would not occur with a reasonable clustering key, even if it wasn't universally ascending.

    Never asked for the details. What made sense as a clustered index for data access would fragment the index rapidly affecting system performance. Defragging the index would essentially shutdown the mirroring to the DR site.

    The choice of a clustering index is more than data access it is also dependent on the environment. In this case a very narrow ever increasing clustered index was the best choice as it kept the index from fragmenting.

  • CELKO (8/29/2014)


    How about something like Coupon types? What would you use there? I am certainly not going to use the description of the type (PercentageDiscount, FreeShipping, BOGO, etc).

    A type is an attribute, measured on an enumeration scale. It is not an identifier, so it cannot be key.

    It seems to me reasonable to interpret Sean as referring to a column whose domain is a Coupon Descriptions (or Coupon Type Names if you would rather call them that) and it seems to me to require a considerable degree of unreasonableness to interpret him differently.

    I know that the SQL standard doesn't include such domains, but it is easy enough to define such a domain using a varchar type and a check constraint, or alternatively with an auxiliary table enumerating the values and having an interger (or tinyint, smallint, or bigint) surrogate key with the domain being represented in other tables by an integer column with a foreign key reference to the auxiliary table.

    Tom

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply