UNIQUEIDENTIFIER vs BIGINT

  • Hi,

    I read somewhere that when ever there's a join between two tables, for example orderlines and products being joined by productid and warehouseid, it's best to create a column do "unify" those two columns for faster searches and shorter indexes.

    For that field is best to use an UNIQUEIDENTIFIER or a BIGINT?

    UNIQUEIDENTIFIERs shouldn't (can't) be used on clustered indexes, but are best for replication purposes since they are unlikely to repeat between databases.

    BIGINT is numeric, smaller and faster on joins (I think)...

    In my case I don't use replication, I only want to store data.

    When ever a table doesn't seem to have a column for PK use an Id column for PK.

    Is this statement true? If so is it better to use Guid or BIGINT?

    Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?

    Thanks,

    Pedro



    If you need to work better, try working less...

  • [

    ...

    Is this statement true? If so is it better to use Guid or BIGINT?

    It depends, but usually BIGINT is better.

    Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?

    ...

    It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (10/2/2012)


    It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.

    That's what I already do, on every table have an Id column BIGINT IDENTITY and the PK on that column is clustered so the data write is sequential.

    GUID have the NEWSEQUENTIALID() but I've read people having problems with that (not working properly) and performance issues.

    http://www.codeproject.com/Articles/32597/Performance-Comparison-Identity-x-NewId-x-NewSeque.

    So I'll stick with what I'm doing 🙂

    Thanks,

    Pedro



    If you need to work better, try working less...

  • Wait for JC to state that "We don't use IDENTITY, Cobol-like monster from 50's" :hehe:

    The only place where I would agree on legitimate use of GUID's for PK, would be systems where PK must be generated by application before inserting data to database for one or another reasons.

    In the rest - I would use surrogate IDENTITY PK's.

    Decision on clustering is a bit separate, but mostly you would want this one clustered as well.

    With new Oracle-like SEQUENCEs in SQL2012, use of GUID's is even less prominent, as you will be able to get PK values from SEQUENCE independent of inserting process.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I don't agree with the initial construct. I've seen plenty of indexes with compound keys work perfectly well for performance without the need to add another column and another index to the table in order to get performance to work. This would be especially true for two columns. You're talking about an index with a key that is 8 bytes wide. You're going to add a bigint column to the table, which is also 8 bytes wide, which saves what exactly? Adding a GUID, which is 16 bytes wide really doesn't save any space or make for a smaller index since it's twice the size.

    Now, if both those columns are not very selective, so that the statistics for them is weak or horribly skewed, yeah, maybe adding another column is a possible solution, but I think we're talking about edge cases.

    Also, be careful with GUIDs. They can lead to severe index fragmentation (which is not a big deal, at all, but is a possible issue) if you're using the GUID as a clustered index.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • GUID vs BigInt vs Int for a surrogate primary key, or the leading edge of any index, is all about what you want the index/key to do. There is no "one is better" answer that's universally true.

    GUIDs have a number of advantages and disadvantages. Storage space, index fragmentation, possibly bandwidth and I/O issues (because of size), and being extremely human-unfriendly, are some of the disadvantages. Close-to-guaranteed-uniqueness, the ability to generate them in the application layer without a database round-trip, lack of real limits on the number you can generate, are some of the advantages.

    BigInt has one advantage over Int, in that it can store 18-pentillion values instead of 4-billion for Int. Rare that you need more than 4-billion, but if you do, BigInt will work and Int won't.

    Int and BigInt have advantages over GUID in terms of size, sequentiality (partially handled by NewSequentialID for GUIDs), and human-readability. They have disadvantages in terms of range, uniqueness, and resource contention in heavy-insert databases (slight but real).

    You need to determine what you need the column to actually do, what your expectations are on data volume in the table, and make an informed decision based on that, not just decide "BigInt is better".

    On the question of composite clustered indexes/keys, it's rare that you'd get a significant performance increase by adding a surrogate key just to bypass a natural key. It can happen, but the reason to use surrogate keys is usually that you can't come up with a valid natural key (people, for example), not that you need to speed up table queries. Surrogate keys, like ID or GUID, are convenient for developers, but they have very little, if anything, to do with performance of queries.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Eugene Elutin (10/2/2012)


    [

    ...

    Is this statement true? If so is it better to use Guid or BIGINT?

    It depends, but usually BIGINT is better.

    Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?

    ...

    It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.

    Almost certainly the CustomerCode will make a vastly better clustering key. In fact, since you already have a unique code per customer, you almost certainly don't need an identity column on that table.

    Often customer codes are stored as character, but if they are entirely numeric, you could store them as int to save space and reduce the overhead of storage and joins.

    In general, a nonclustered index is perfect for identity values. Use the clustered index for a more appropriate column. Of course specific joins and other things can change this.

    An identity column should NOT be the default choice for a clustering key, period. You should always put some thought into the proper clustering key, not just base it on an overly-simplistic rule.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (10/2/2012)


    Eugene Elutin (10/2/2012)


    [

    ...

    Is this statement true? If so is it better to use Guid or BIGINT?

    It depends, but usually BIGINT is better.

    Another question, related to keys... If a table uses Id as PK should it be clustered if BIGINT identity? Or if, say a customers table, has a CustomerCode that's used for searches should that be the column for a clustered index (even if when inserting the codes aren't sequential generating page moves in the database files)?

    ...

    It depends, but in this case, most likely your PK should be clustered and you will create non-clustered index on CustomerCode.

    Almost certainly the CustomerCode will make a vastly better clustering key. In fact, since you already have a unique code per customer, you almost certainly don't need an identity column on that table.

    Often customer codes are stored as character, but if they are entirely numeric, you could store them as int to save space and reduce the overhead of storage and joins.

    In general, a nonclustered index is perfect for identity values. Use the clustered index for a more appropriate column. Of course specific joins and other things can change this.

    An identity column should NOT be the default choice for a clustering key, period. You should always put some thought into the proper clustering key, not just base it on an overly-simplistic rule.

    1. I haven't seen OP stating that CustomerCode is unique and cannot be reused.

    2. Sinlge searching pattern alone doesn't mean that this key should be made to clustered index, may be but not always. If CustomerCode is of alpha-numeric nature, making this clustered will lead to higher fragmentation.

    3. Until there is a good reason, identity column is a good choice for a clustered index, as it is narrow, unique, static and always increasing (which helps to avoid fragmentation)

    If CustomerCode is really used for searching one single customer record, the performance wise, having it as clustered or non-clustered index will not be much different.

    I do usually use a bit more than this when determine candidate for clustered index:

    1. Range searches

    2. Use in joins

    3. Use in ORDER BY and GROUP BY

    4. Returning large resultsets.

    Also clustered index narrowness plays some role in the effectiveness of index as well.

    I guess there are two camps here: Natural Keys vs Surrogate Keys :hehe:

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Typically customer codes are sequential. Since they're not here, that would definitely make the clustering key choice much more difficult.

    If customer code is (almost) always used for lookups/processing, you still save a huge amount of overhead by not having to go thru a nonclus index, which at minimum roughly doubles the reads to get to a specific row.

    As I noted before, if the customer code is actually a customer number, it can also be stored as an int rather than as character. If it's alphanum, naturally that would also have to be taken into consideration.

    I don't object to surrogate keys in general, just the idea that by default tables should be clustered by identity -- that's just laziness. There is no one-size-fits-all rules when it comes to determing the best clustering key for a given table.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Use the clustered index for a more appropriate column

    For large tables, the "appropriate column" is usually unique, narrow, and ever-increasing. "Customer Code" columns almost never meet all 3 of that criteria.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/3/2012)


    Use the clustered index for a more appropriate column

    For large tables, the "appropriate column" is usually unique, narrow, and ever-increasing. "Customer Code" columns almost never meet all 3 of that criteria.

    Determing the appropriate column actually takes analysis; again, there is no one-rule-fits-all-tables for the best clustered key column.

    Customer ids are usually unique and ever-increasing; narrowness is sometimes not as good because char is used even though when the values are purely numeric. Even so, often a somewhat larger clustered key is best overall for performance.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher (10/3/2012)


    Determing the appropriate column actually takes analysis; again, there is no one-rule-fits-all-tables for the best clustered key column.

    Customer ids are usually unique and ever-increasing; narrowness is sometimes not as good because char is used even though when the values are purely numeric. Even so, often a somewhat larger clustered key is best overall for performance.

    But you didn't say Customer "ID"... you said Customer "Code". Perhaps it's a difference in terminology but I've found that the two are usually drastically different.

    So far as the no one-rule-fits-all-tables, I agree. That notwithstanding, the rule I stated is very good for larger tables because of the implications on non-clustered indexes. About the only real analysis needed on such tables is trying to determine if the first column of the clustered index should be some sort of date column or not. A narrow and unique secondary column, such as an IDENTITY column would be necessary or an 8 byte internal row ID would be added to the clustered index. Narrowness is always a good thing in an index simply because you can fit more rows per page on indexes and the clustered index is automatically appended to every non-clustered index that doesn't explicitly use the clustered index column(s).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher (10/3/2012)


    Typically customer codes are sequential. Since they're not here, that would definitely make the clustering key choice much more difficult.

    Typically customer codes occur in order records more often than in customer records, and while they may be sequential in order of creation of customers they are certainly not sequential in terms of creation of orders. So the ideathat a customer code is typically sequential in relation to whatever table we are talking about here is not reasonable.

    Tom

  • This discussion is bringing us back to the similar, quite heated one, we had not long ago.

    Does a "Customer" have a "Natural Key"?

    Customer Code? May be, but to make it unique, you will need to introduce the special infrastructure which will need to generate one and ensure it is unique within organisation. So, for this organisation, you can make it "Natural", but that's about it.

    Now, why to bother with the cost of infrastructure to generate and maintain such unique "Customer Codes"? Just to make it look nicer? Kind of alpha-numeric?

    Identity, could easily play the role of "Customer Code", to make it more appearing to customer, you can format it as you wish when it's shown to them.

    For example, to please Chinese customers, make Customer Code/Id BIGINT and seed it from 77700000000000. Format it as 777-000-0000-0000 for better readability on output documents and it will make Asian people happy :hehe:

    Actually, I would not even use such name as "Customer Code" for a key. For me it means more kind of "Customer Type". Customer Number or Account Number at least, or simply Customer ID is sounds better to me.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • L' Eomot Inversé (10/4/2012)


    ScottPletcher (10/3/2012)


    Typically customer codes are sequential. Since they're not here, that would definitely make the clustering key choice much more difficult.

    Typically customer codes occur in order records more often than in customer records, and while they may be sequential in order of creation of customers they are certainly not sequential in terms of creation of orders. So the ideathat a customer code is typically sequential in relation to whatever table we are talking about here is not reasonable.

    just like a products table... every product has a product code which is used in searches but the inserts are never sequential and usually they are strings.. so a surrogate key would apply very well also in this case..

    natural keys are "beautiful" but very hard to determine what columns should be... can be customer code but also the ITIN since every person has one but usually you don't give your number before you make a purchase, when you initially register the customer...

    One question though...

    Suppose I have a Products, a Warehouses, a WarehouseProducts, Orders and OrderProducts and the OrderProducts is "linked" to WarehouseProducts, since I can make an order to my suppliers for the same product but to different warehouses (on the same order).

    Products has ProductCode, Warehouses has WarehouseCode, WarehouseProducts has ProductCode and WarehouseCode and OrderProducts as ProductCode and WarehouseCode.. these are all strings... Isn't it better to add surrogate keys IDENTITY to all tables and make the "link" by it?

    Pro: fast in joins, less columns on "linked" tables...

    Cons: If I want to get the product description from an order I have to use the WarehouseProducts to get to the Products table.. (one more join...). - or could simply add the ProductId along with the WarehouseProductId on the OrderProducts table but that seems "redundant"...

    Thanks,

    Pedro



    If you need to work better, try working less...

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply