Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

UNIQUEIDENTIFIER vs BIGINT Expand / Collapse
Author
Message
Posted Wednesday, October 3, 2012 4:57 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:49 PM
Points: 2,030, Visits: 3,030
Jeff Moden (10/3/2012)
Use the clustered index for a more appropriate column


For large tables, the "appropriate column" is usually unique, narrow, and ever-increasing. "Customer Code" columns almost never meet all 3 of that criteria.



Determing the appropriate column actually takes analysis; again, there is no one-rule-fits-all-tables for the best clustered key column.

Customer ids are usually unique and ever-increasing; narrowness is sometimes not as good because char is used even though when the values are purely numeric. Even so, often a somewhat larger clustered key is best overall for performance.


SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1368106
Posted Wednesday, October 3, 2012 6:05 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:44 PM
Points: 36,959, Visits: 31,472
ScottPletcher (10/3/2012)
Determing the appropriate column actually takes analysis; again, there is no one-rule-fits-all-tables for the best clustered key column.

Customer ids are usually unique and ever-increasing; narrowness is sometimes not as good because char is used even though when the values are purely numeric. Even so, often a somewhat larger clustered key is best overall for performance.


But you didn't say Customer "ID"... you said Customer "Code". Perhaps it's a difference in terminology but I've found that the two are usually drastically different.

So far as the no one-rule-fits-all-tables, I agree. That notwithstanding, the rule I stated is very good for larger tables because of the implications on non-clustered indexes. About the only real analysis needed on such tables is trying to determine if the first column of the clustered index should be some sort of date column or not. A narrow and unique secondary column, such as an IDENTITY column would be necessary or an 8 byte internal row ID would be added to the clustered index. Narrowness is always a good thing in an index simply because you can fit more rows per page on indexes and the clustered index is automatically appended to every non-clustered index that doesn't explicitly use the clustered index column(s).


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1368118
Posted Thursday, October 4, 2012 3:04 AM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 11:09 AM
Points: 8,690, Visits: 9,226
ScottPletcher (10/3/2012)
Typically customer codes are sequential. Since they're not here, that would definitely make the clustering key choice much more difficult.

Typically customer codes occur in order records more often than in customer records, and while they may be sequential in order of creation of customers they are certainly not sequential in terms of creation of orders. So the ideathat a customer code is typically sequential in relation to whatever table we are talking about here is not reasonable.


Tom
Post #1368244
Posted Thursday, October 4, 2012 3:41 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, August 18, 2014 8:36 AM
Points: 2,836, Visits: 5,066
This discussion is bringing us back to the similar, quite heated one, we had not long ago.
Does a "Customer" have a "Natural Key"?
Customer Code? May be, but to make it unique, you will need to introduce the special infrastructure which will need to generate one and ensure it is unique within organisation. So, for this organisation, you can make it "Natural", but that's about it.
Now, why to bother with the cost of infrastructure to generate and maintain such unique "Customer Codes"? Just to make it look nicer? Kind of alpha-numeric?
Identity, could easily play the role of "Customer Code", to make it more appearing to customer, you can format it as you wish when it's shown to them.
For example, to please Chinese customers, make Customer Code/Id BIGINT and seed it from 77700000000000. Format it as 777-000-0000-0000 for better readability on output documents and it will make Asian people happy

Actually, I would not even use such name as "Customer Code" for a key. For me it means more kind of "Customer Type". Customer Number or Account Number at least, or simply Customer ID is sounds better to me.


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1368271
Posted Thursday, October 4, 2012 3:42 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Thursday, June 12, 2014 9:30 AM
Points: 513, Visits: 1,129
L' Eomot Inversé (10/4/2012)
ScottPletcher (10/3/2012)
Typically customer codes are sequential. Since they're not here, that would definitely make the clustering key choice much more difficult.

Typically customer codes occur in order records more often than in customer records, and while they may be sequential in order of creation of customers they are certainly not sequential in terms of creation of orders. So the ideathat a customer code is typically sequential in relation to whatever table we are talking about here is not reasonable.

just like a products table... every product has a product code which is used in searches but the inserts are never sequential and usually they are strings.. so a surrogate key would apply very well also in this case..
natural keys are "beautiful" but very hard to determine what columns should be... can be customer code but also the ITIN since every person has one but usually you don't give your number before you make a purchase, when you initially register the customer...

One question though...
Suppose I have a Products, a Warehouses, a WarehouseProducts, Orders and OrderProducts and the OrderProducts is "linked" to WarehouseProducts, since I can make an order to my suppliers for the same product but to different warehouses (on the same order).
Products has ProductCode, Warehouses has WarehouseCode, WarehouseProducts has ProductCode and WarehouseCode and OrderProducts as ProductCode and WarehouseCode.. these are all strings... Isn't it better to add surrogate keys IDENTITY to all tables and make the "link" by it?
Pro: fast in joins, less columns on "linked" tables...
Cons: If I want to get the product description from an order I have to use the WarehouseProducts to get to the Products table.. (one more join...). - or could simply add the ProductId along with the WarehouseProductId on the OrderProducts table but that seems "redundant"...

Thanks,
Pedro




If you need to work better, try working less...
Post #1368272
Posted Thursday, October 4, 2012 8:36 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:49 PM
Points: 2,030, Visits: 3,030
Jeff Moden (10/3/2012)
ScottPletcher (10/3/2012)
Determing the appropriate column actually takes analysis; again, there is no one-rule-fits-all-tables for the best clustered key column.

Customer ids are usually unique and ever-increasing; narrowness is sometimes not as good because char is used even though when the values are purely numeric. Even so, often a somewhat larger clustered key is best overall for performance.


But you didn't say Customer "ID"... you said Customer "Code". Perhaps it's a difference in terminology but I've found that the two are usually drastically different.

So far as the no one-rule-fits-all-tables, I agree. That notwithstanding, the rule I stated is very good for larger tables because of the implications on non-clustered indexes. About the only real analysis needed on such tables is trying to determine if the first column of the clustered index should be some sort of date column or not. A narrow and unique secondary column, such as an IDENTITY column would be necessary or an 8 byte internal row ID would be added to the clustered index. Narrowness is always a good thing in an index simply because you can fit more rows per page on indexes and the clustered index is automatically appended to every non-clustered index that doesn't explicitly use the clustered index column(s).



I don't think such extremely simplistic rules always work well in tables, large or small.

Choosing the best indexes takes a lot of careful consideration of your data distribution and your query usage patterns, and there is no one-size-fits-all answer.

If a table is declared with a pk (which is advisable), by default a clustered index will be built on the primary key column(s). However, this is not always the best choice. The primary key is, by definition, unique, and nonclustered indexes are very efficient at finding a single qualifying row and can still enforce the primary key's uniqueness. So save your clustered index for something that will benefit more from it.


SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1368465
Posted Thursday, October 4, 2012 9:02 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, August 18, 2014 8:36 AM
Points: 2,836, Visits: 5,066
ScottPletcher (10/4/2012)
[quote][b]

I don't think such extremely simplistic rules always work well in tables, large or small.

Choosing the best indexes takes a lot of careful consideration of your data distribution and your query usage patterns, and there is no one-size-fits-all answer.

If a table is declared with a pk (which is advisable), by default a clustered index will be built on the primary key column(s). However, this is not always the best choice. The primary key is, by definition, unique, and nonclustered indexes are very efficient at finding a single qualifying row and can still enforce the primary key's uniqueness. So save your clustered index for something that will benefit more from it.


Yes, you are right! There is no one-size-fits-all answer to that.
However...
I guess it was some long discussions between SQL Server (and back to old days Sybase) design engineers/architects about making PK clustered by default. They decided to do so, and not only for the reason "not to create heaps". PK is not always the best choice for clustered, but quite often it is. Use of Surrogate keys for PK is quite popular in modern RDBMS's (at least in SQL Server and Oracle), so may be that is another statistical manifestation example of 80-20 rule...


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1368494
Posted Thursday, October 4, 2012 12:50 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 2:49 PM
Points: 2,030, Visits: 3,030
Eugene Elutin (10/4/2012)
ScottPletcher (10/4/2012)
[quote][b]

I don't think such extremely simplistic rules always work well in tables, large or small.

Choosing the best indexes takes a lot of careful consideration of your data distribution and your query usage patterns, and there is no one-size-fits-all answer.

If a table is declared with a pk (which is advisable), by default a clustered index will be built on the primary key column(s). However, this is not always the best choice. The primary key is, by definition, unique, and nonclustered indexes are very efficient at finding a single qualifying row and can still enforce the primary key's uniqueness. So save your clustered index for something that will benefit more from it.


Yes, you are right! There is no one-size-fits-all answer to that.
However...
I guess it was some long discussions between SQL Server (and back to old days Sybase) design engineers/architects about making PK clustered by default. They decided to do so, and not only for the reason "not to create heaps". PK is not always the best choice for clustered, but quite often it is. Use of Surrogate keys for PK is quite popular in modern RDBMS's (at least in SQL Server and Oracle), so may be that is another statistical manifestation example of 80-20 rule...



The problem is, that's nothing close to the real ratio in production. ~60+% of legacy tuning for me is just changing the dopey identity clustered index to the proper clustered index, with an easy 50+% gain in performance.

A clustering key does NOT have to be unique to aid performance. That is the worst secondary myth that gets propogated by the main myth about always making the clus key an identity. It's just a thoughtless approach.


SQL DBA,SQL Server MVP('07, '08, '09)
"In America, every man is innocent until proven broke!" Brant Parker
Post #1368646
Posted Thursday, October 4, 2012 1:45 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 11:09 AM
Points: 8,690, Visits: 9,226
Eugene Elutin (10/4/2012)
Yes, you are right! There is no one-size-fits-all answer to that.
However...
I guess it was some long discussions between SQL Server (and back to old days Sybase) design engineers/architects about making PK clustered by default. They decided to do so, and not only for the reason "not to create heaps". PK is not always the best choice for clustered, but quite often it is. Use of Surrogate keys for PK is quite popular in modern RDBMS's (at least in SQL Server and Oracle), so may be that is another statistical manifestation example of 80-20 rule...

I think that in more than 20% of cases a large enough proportion of the work required of the dbms involves searches for a range of values in some initial columns of the natural key. A surrogate key may of course still be useful for reducing the storage required in other tables for references to this table (and consequently reducing the number of in those other tables pages handled by joins with this table), but while it does no harm to make the surrogate the primary key it is not at all sensible in these cases to make it the cluster key - the natural key should have a unique constraint on it and non-null constraints on each of its columns (ie the table should be normalised at least to 1NF, to keep the code manipulating tha table simple - actually I generally can't see any point in normalisation to lower levels than EKNF, but only 1NF is relevant to this discussion) and either that index (if it's narrow enough) or an index on the relevant intial part (perhaps just one column) of the natural key should be used for clustering.


Tom
Post #1368678
Posted Thursday, October 4, 2012 1:51 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Today @ 11:09 AM
Points: 8,690, Visits: 9,226
Eugene Elutin (10/4/2012)
Yes, you are right! There is no one-size-fits-all answer to that.
However...
I guess it was some long discussions between SQL Server (and back to old days Sybase) design engineers/architects about making PK clustered by default. They decided to do so, and not only for the reason "not to create heaps". PK is not always the best choice for clustered, but quite often it is. Use of Surrogate keys for PK is quite popular in modern RDBMS's (at least in SQL Server and Oracle), so may be that is another statistical manifestation example of 80-20 rule...

I think that in more than 20% of cases a large enough proportion of the work required of the dbms involves searches for a range of values in some initial columns of the natural key. A surrogate key may of course still be useful for reducing the storage required in other tables for references to this table (and consequently reducing the number of in those other tables pages handled by joins with this table), but while it does no harm to make the surrogate the primary key it is not always sensible in these cases to make it the cluster key - the natural key should have a unique constraint on it and non-null constraints on each of its columns (ie the table should be normalised at least to 1NF, to keep the code manipulating tha table simple - actually I generally can't see any point in normalisation to lower levels than EKNF, but only 1NF is relevant to this discussion) and often either that index (if it's narrow enough) or an index on the relevant intial part (perhaps just one column) of the natural key should be used for clustering.
Every case needs careful analysis, and I don't believe there's an 80 to 20 rule in favour of clustering on a surrogate.


Tom
Post #1368680
« Prev Topic | Next Topic »

Add to briefcase ««12345»»»

Permissions Expand / Collapse