GUID vs BIGINT

  • hi,

    difference in guid and int

    GUID Column with clustered index vs Big int column with clustered index ?--

    GUID Column with non clustered index vs Big int column with clustered index ?

    which is best in join conditions and why?

    how about fragmentation on GUID and INT Column?

    which to go for max 10 millions of record tables with database around max of 50 GB.

    Thanks!

  • Well, a lot of the information is right there in the SQL Server documentation. A GUID is 16 bytes while a bigint is 8 bytes. That alone affects the distribution of the data across the key pages in any b-tree of a clustered or non-clustered index. This means with the GUID, you'll have fewer values per page which can lead to a larger, deeper index. Because GUIDs are random (unless you can use NEWSEQUENTIALID to generate your GUIDs) you're likely to get a lot more fragmentation of the index, clustered or non-clustered. For point lookups, they're pretty much the same as bigint. For scans, again, fewer records per page and possibly a lot more fragmentation means more reads for the GUID. For a JOIN, either will work as described. A point join, meaning an index seek with no or limited scanning, they'll be the same. An index scan, and, under most circumstances, the bigint will work better. Except for the fragmentation, really, GUID isn't all that big a deal, assuming most of your queries are point lookups with minimal scans.

    But, you're only talking about 10s of millions of values and a 50gb database. Why not just go with INT? That's 4 bytes, half the size of bigint and supports up to 2.1 billion values (double that if you seed your identity column at -2.1 billion).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I have to agree with Grant completely. I have found that an integer is the best way to go. They're narrow, non-volatile and sequential and have the added bonus of being more readable than a GUID. If you need to support more rows, a big integer is next. Keep in mind that the bytes of the clustered index are inherited in every nonclustered index, so GUIDs will add 16 bytes to every single row in every single nonclustered index you create. Do the math and you'll see the impact on your storage requirements.

  • Excellent point on the storage of the cluster key in nonclustered indexes Ed. I completely missed that one.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Do the table will get fragmentation often in GUID?

    I checked the table with the fragmentation so i ran the dbcc dbreindex (we can ran the alter command also but i ran the dbcc dbreindex command) , so now my tables or not fragmented.After this i ran couple of insert statements on the table for one day and again i found the tables got fragmented so why it is so, will GUID have more fragmentation?

  • yuvipoy (5/13/2014)


    Do the table will get fragmentation often in GUID?

    I checked the table with the fragmentation so i ran the dbcc dbreindex (we can ran the alter command also but i ran the dbcc dbreindex command) , so now my tables or not fragmented.After this i ran couple of insert statements on the table for one day and again i found the tables got fragmented so why it is so, will GUID have more fragmentation?

    Grant already answered this.

    Grant Fritchey (5/13/2014)


    Because GUIDs are random (unless you can use NEWSEQUENTIALID to generate your GUIDs) you're likely to get a lot more fragmentation of the index, clustered or non-clustered.

  • If i use NEWSEQUENTIALID() then there will be less fragmentation ok good, but how about the join behavioural ?

    for the test of 1000 record on a table with INT/GUID here is the fragmentation information

    table consists of 2 columns col1 INT/GUID , col2 varchar(1000)

    Col1 is primarykey

    avg_fragmentation_in_percent, fragment_count, page_count

    using NEWSEQUENTIALID() column fragmentation 2.0958083832335348334

    using INT column fragmentation 2.4 38250

    performance wise which is good in "join" condition and in "where" condition.

  • Haven't we been over this a couple times already?

    http://www.sqlservercentral.com/Forums/Topic1553317-391-1.aspx

    http://www.sqlservercentral.com/Forums/Topic1549110-391-1.aspx

    http://www.sqlservercentral.com/Forums/Topic1548615-391-1.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • i have been here for couple of time but now more concern on fragmentation.

  • If you have concerns about fragmentation, use an integer. If you need to support more rows, use a big integer. If you aren't concerned about fragmentation or storage, go ahead and use your GUID approach. Personally, I don't use them for artificial keys.

    Take the time to read what's above. If you haven't read Gail's intro to indexing articles, I would do so.

  • yuvipoy (5/14/2014)


    i have been here for couple of time but now more concern on fragmentation.

    Unless you have a very good and specific reason for using GUIDS (like distributed databases or something like that), don't use it.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • yuvipoy (5/14/2014)


    If i use NEWSEQUENTIALID() then there will be less fragmentation ok good, but how about the join behavioural ?

    for the test of 1000 record on a table with INT/GUID here is the fragmentation information

    table consists of 2 columns col1 INT/GUID , col2 varchar(1000)

    Col1 is primarykey

    avg_fragmentation_in_percent, fragment_count, page_count

    using NEWSEQUENTIALID() column fragmentation 2.0958083832335348334

    using INT column fragmentation 2.4 38250

    performance wise which is good in "join" condition and in "where" condition.

    There is no real difference in the JOIN or the WHERE criteria between these two. It's not where they are in the query that matters. It's how the data is being accessed. For a fragmented index, regardless of the data type, you'll see almost exactly the same performance for a single row lookup as you do for a non-fragmented index. For a fragmented index, regardless of data type, you'll see worse performance for scans than for a non-fragmented index. So it's not about JOIN or WHERE, it's about the access methods, scan versus seek. If you're getting lots of small data set seeks, the fragmentation won't matter much. As soon as you throw scans in, the fragmentation becomes extremely important.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • yuvipoy (5/14/2014)


    If i use NEWSEQUENTIALID() then there will be less fragmentation ok good, but how about the join behavioural ?

    for the test of 1000 record on a table with INT/GUID here is the fragmentation information

    table consists of 2 columns col1 INT/GUID , col2 varchar(1000)

    Col1 is primarykey

    avg_fragmentation_in_percent, fragment_count, page_count

    using NEWSEQUENTIALID() column fragmentation 2.0958083832335348334

    using INT column fragmentation 2.4 38250

    performance wise which is good in "join" condition and in "where" condition.

    Also just look at page_count. The integer obviously has less pages than the GUID. So scans are faster (less IO) and storage is smaller.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Don't test with a 1,000-row table. Test with a 1,000,000-row table. Many things can run well with 1,000 rows. Testing on a 1,000,000 rows tends to expose weakness. If you're going to routinely have more that 1M rows, use a larger test table for performance benchmarking. 1M is a good place to start. Set your statistics io on to see the difference in reads when you query your test table. 4 bytes is smaller than 16 bytes.

    Grant - Good explanation. I'm looking forward to your presentation on query plans this Saturday.

  • Here comes another question

    I created a table with GUID column with NEWSEQUENTIALID() as default.

    create table table_name (i UNIQUEIDENTIFIER deafult NEWSEQUENTIALID() , j int,k int)

    While inserting sometime the value are derived from user itself like

    Insert into table_name

    values ('FFF77AB0-670B-4EAA-9233-1CA7D83A141E',1123,456) /*generated from newid()*/

    Insert into table_name

    values ('D518EA90-8BC4-4791-AD56-B112BC407975',113,1456) /*generated from newid()*/

    .

    ......

    sometimes it will not what will happen in that case.

    Insert into table_name (j,k) values (1123,456)

    Insert into table_name (j,k) values (113,1456)

    what will happen here?

    will there be fragmentation?

    I knew that we should not use newid() ,

    but in case if we use newid() to generate id on NEWSEQUENTIALID() default column what will happen ? will the table get fragmented?

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply