Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

UNIQUEIDENTIFIER vs BIGINT Expand / Collapse
Author
Message
Posted Tuesday, October 2, 2012 9:13 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 12:30 AM
Points: 524, Visits: 1,158
Hi,

I read somewhere that when ever there's a join between two tables, for example orderlines and products being joined by productid and warehouseid, it's best to create a column do "unify" those two columns for faster searches and shorter indexes.
For that field is best to use an UNIQUEIDENTIFIER or a BIGINT?
UNIQUEIDENTIFIERs shouldn't (can't) be used on clustered indexes, but are best for replication purposes since they are unlikely to repeat between databases.
BIGINT is numeric, smaller and faster on joins (I think)...
In my case I don't use replication, I only want to store data.
When ever a table doesn't seem to have a column for PK use an Id column for PK.

Is this statement true? If so is it better to use Guid or BIGINT?

Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?

Thanks,
Pedro




If you need to work better, try working less...
Post #1367112
Posted Tuesday, October 2, 2012 9:17 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, December 12, 2014 10:09 AM
Points: 2,876, Visits: 5,201
[
...
Is this statement true? If so is it better to use Guid or BIGINT?


It depends, but usually BIGINT is better.


Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?
...


It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.



_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1367114
Posted Tuesday, October 2, 2012 9:35 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Today @ 12:30 AM
Points: 524, Visits: 1,158
Eugene Elutin (10/2/2012)

It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.

That's what I already do, on every table have an Id column BIGINT IDENTITY and the PK on that column is clustered so the data write is sequential.
GUID have the NEWSEQUENTIALID() but I've read people having problems with that (not working properly) and performance issues.
http://www.codeproject.com/Articles/32597/Performance-Comparison-Identity-x-NewId-x-NewSeque.
So I'll stick with what I'm doing

Thanks,
Pedro




If you need to work better, try working less...
Post #1367124
Posted Tuesday, October 2, 2012 10:01 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, December 12, 2014 10:09 AM
Points: 2,876, Visits: 5,201
Wait for JC to state that "We don't use IDENTITY, Cobol-like monster from 50's"
The only place where I would agree on legitimate use of GUID's for PK, would be systems where PK must be generated by application before inserting data to database for one or another reasons.
In the rest - I would use surrogate IDENTITY PK's.
Decision on clustering is a bit separate, but mostly you would want this one clustered as well.
With new Oracle-like SEQUENCEs in SQL2012, use of GUID's is even less prominent, as you will be able to get PK values from SEQUENCE independent of inserting process.


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1367144
Posted Tuesday, October 2, 2012 10:41 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 9:11 PM
Points: 14,196, Visits: 28,522
I don't agree with the initial construct. I've seen plenty of indexes with compound keys work perfectly well for performance without the need to add another column and another index to the table in order to get performance to work. This would be especially true for two columns. You're talking about an index with a key that is 8 bytes wide. You're going to add a bigint column to the table, which is also 8 bytes wide, which saves what exactly? Adding a GUID, which is 16 bytes wide really doesn't save any space or make for a smaller index since it's twice the size.

Now, if both those columns are not very selective, so that the statistics for them is weak or horribly skewed, yeah, maybe adding another column is a possible solution, but I think we're talking about edge cases.

Also, be careful with GUIDs. They can lead to severe index fragmentation (which is not a big deal, at all, but is a possible issue) if you're using the GUID as a clustered index.


----------------------------------------------------
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood..." Theodore Roosevelt
The Scary DBA
Author of:
SQL Server Query Performance Tuning
and
SQL Server Execution Plans

Product Evangelist for Red Gate Software
Post #1367166
Posted Tuesday, October 2, 2012 11:59 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, December 5, 2014 10:31 AM
Points: 13,872, Visits: 9,599
GUID vs BigInt vs Int for a surrogate primary key, or the leading edge of any index, is all about what you want the index/key to do. There is no "one is better" answer that's universally true.

GUIDs have a number of advantages and disadvantages. Storage space, index fragmentation, possibly bandwidth and I/O issues (because of size), and being extremely human-unfriendly, are some of the disadvantages. Close-to-guaranteed-uniqueness, the ability to generate them in the application layer without a database round-trip, lack of real limits on the number you can generate, are some of the advantages.

BigInt has one advantage over Int, in that it can store 18-pentillion values instead of 4-billion for Int. Rare that you need more than 4-billion, but if you do, BigInt will work and Int won't.

Int and BigInt have advantages over GUID in terms of size, sequentiality (partially handled by NewSequentialID for GUIDs), and human-readability. They have disadvantages in terms of range, uniqueness, and resource contention in heavy-insert databases (slight but real).

You need to determine what you need the column to actually do, what your expectations are on data volume in the table, and make an informed decision based on that, not just decide "BigInt is better".

On the question of composite clustered indexes/keys, it's rare that you'd get a significant performance increase by adding a surrogate key just to bypass a natural key. It can happen, but the reason to use surrogate keys is usually that you can't come up with a valid natural key (people, for example), not that you need to speed up table queries. Surrogate keys, like ID or GUID, are convenient for developers, but they have very little, if anything, to do with performance of queries.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1367212
Posted Tuesday, October 2, 2012 3:51 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 3:44 PM
Points: 2,324, Visits: 3,500
Eugene Elutin (10/2/2012)
[
...
Is this statement true? If so is it better to use Guid or BIGINT?


It depends, but usually BIGINT is better.


Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?
...


It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.




Almost certainly the CustomerCode will make a vastly better clustering key. In fact, since you already have a unique code per customer, you almost certainly don't need an identity column on that table.

Often customer codes are stored as character, but if they are entirely numeric, you could store them as int to save space and reduce the overhead of storage and joins.

In general, a nonclustered index is perfect for identity values. Use the clustered index for a more appropriate column. Of course specific joins and other things can change this.

An identity column should NOT be the default choice for a clustering key, period. You should always put some thought into the proper clustering key, not just base it on an overly-simplistic rule.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1367314
Posted Tuesday, October 2, 2012 6:49 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, December 12, 2014 10:09 AM
Points: 2,876, Visits: 5,201
ScottPletcher (10/2/2012)
Eugene Elutin (10/2/2012)
[
...
Is this statement true? If so is it better to use Guid or BIGINT?


It depends, but usually BIGINT is better.


Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?
...


It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.




Almost certainly the CustomerCode will make a vastly better clustering key. In fact, since you already have a unique code per customer, you almost certainly don't need an identity column on that table.

Often customer codes are stored as character, but if they are entirely numeric, you could store them as int to save space and reduce the overhead of storage and joins.

In general, a nonclustered index is perfect for identity values. Use the clustered index for a more appropriate column. Of course specific joins and other things can change this.

An identity column should NOT be the default choice for a clustering key, period. You should always put some thought into the proper clustering key, not just base it on an overly-simplistic rule.


1. I haven't seen OP stating that CustomerCode is unique and cannot be reused.
2. Sinlge searching pattern alone doesn't mean that this key should be made to clustered index, may be but not always. If CustomerCode is of alpha-numeric nature, making this clustered will lead to higher fragmentation.
3. Until there is a good reason, identity column is a good choice for a clustered index, as it is narrow, unique, static and always increasing (which helps to avoid fragmentation)

If CustomerCode is really used for searching one single customer record, the performance wise, having it as clustered or non-clustered index will not be much different.

I do usually use a bit more than this when determine candidate for clustered index:
1. Range searches
2. Use in joins
3. Use in ORDER BY and GROUP BY
4. Returning large resultsets.

Also clustered index narrowness plays some role in the effectiveness of index as well.

I guess there are two camps here: Natural Keys vs Surrogate Keys




_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1367362
Posted Wednesday, October 3, 2012 8:43 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Yesterday @ 3:44 PM
Points: 2,324, Visits: 3,500
Typically customer codes are sequential. Since they're not here, that would definitely make the clustering key choice much more difficult.

If customer code is (almost) always used for lookups/processing, you still save a huge amount of overhead by not having to go thru a nonclus index, which at minimum roughly doubles the reads to get to a specific row.

As I noted before, if the customer code is actually a customer number, it can also be stored as an int rather than as character. If it's alphanum, naturally that would also have to be taken into consideration.

I don't object to surrogate keys in general, just the idea that by default tables should be clustered by identity -- that's just laziness. There is no one-size-fits-all rules when it comes to determing the best clustering key for a given table.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1367737
Posted Wednesday, October 3, 2012 4:43 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 11:11 PM
Points: 35,769, Visits: 32,432
Use the clustered index for a more appropriate column


For large tables, the "appropriate column" is usually unique, narrow, and ever-increasing. "Customer Code" columns almost never meet all 3 of that criteria.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1368097
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse