GUID or Int is the best for primary key columns?

  • David.Poole (7/15/2009)


    Gaby A. (7/15/2009)


    What's wrong with a bigint identity(1, 1).

    Nothing provided you are running on a single server. The instant you go multi-site or employ a sharding strategy all sorts of challenges come to the fore.

    True, and that's where newsequentialid() is a much better option if you need that option.

    Here's a kluge someone showed me once:

    create table #temp_seqguid

    (

    myguid uniqueidentifier default newsequentialid() -- necessary, as only available as a default setting

    )

    insert into #temp_seqguid default values

    declare @GUID uniqueidentifier

    select top 1 @GUID = myguid from #temp_seqguid

    drop table #temp_seqguid

    [EDIT] And here's a modified version based on David.Poole's suggestion below:

    declare @seqGUID table ( myguid uniqueidentifier default newsequentialid() )

    insert into @seqGUID default values

    declare @GUID uniqueidentifier

    select top 1 @GUID = myguid from @seqGUID

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • Nice but I would use a table variable for this rather than a temporary table.

  • David.Poole (7/15/2009)


    Nice but I would use a table variable for this rather than a temporary table.

    True, for only few values, the streamlined code of table variables would be better. Code modified in previous post.

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • Dave Ballantyne (7/15/2009)


    Not that i doubt you but, are you 100% sure that that can be the only explanation ? There were no other bugs that could of caused it to happen?

    According to wikipedia

    While each generated GUID is not guaranteed to be unique, the total number of unique keys (21^28 or 3.4×10^38) is so large that the probability of the same number being generated twice is infinitesimally small. For example, consider the observable universe, which contains about 5×10^22 stars; every star could then have 6.8×10^15 unique GUIDs.

    The odds on 3 out of 2 Billion being duplicated are (infinitesimally small cubed) ???

    The mathematical odds are quite small, but computers don't actually generate truly random numbers, which means the odds are much higher as soon as you have a lot of "random" numbers being generated in very short periods of time.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • David.Poole (7/15/2009)


    Gaby A. (7/15/2009)


    What's wrong with a bigint identity(1, 1).

    Nothing provided you are running on a single server. The instant you go multi-site or employ a sharding strategy all sorts of challenges come to the fore.

    Easy enough to solve if you have a two-column PK. One is the identity value, the other is the source computer ID.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Lots of good advice in this thread!

    Another way that anything relying on a MAC address can produce duplicate results is that you are relying on all MAC addresses being globally unique.

    For virtual servers with virtual networks this is not true. Also, manufacturers of some cheap network cards do not always give them unique MAC addresses.

    The first problem you can solve with configuring your virtual network. The second problem is unlikely to affect servers as they seldom use bottom-end cards, but if you have multiple desktops or laptops feeding data to your server you may hit this issue.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • EdVassie (7/16/2009)


    Lots of good advice in this thread!

    Another way that anything relying on a MAC address can produce duplicate results is that you are relying on all MAC addresses being globally unique.

    For virtual servers with virtual networks this is not true. Also, manufacturers of some cheap network cards do not always give them unique MAC addresses.

    The first problem you can solve with configuring your virtual network. The second problem is unlikely to affect servers as they seldom use bottom-end cards, but if you have multiple desktops or laptops feeding data to your server you may hit this issue.

    If you really need to make sure, when using newid() or newsequentialid(), have a binary variable added to the GUID made up of system unique variables. For example:

    declare @GUID uniqueidentifier, @salt varbinary(100)

    set @salt = cast(checksum(@@servername) as varbinary(100)) -- or additional system info

    set @GUID = cast(newid() as varbinary(100)) + @salt

    Yet another kluge, but this should have minimal to no impact on performance.

    Gaby________________________________________________________________"In theory, theory and practice are the same. In practice, they are not." - Albert Einstein

  • SQL Server GUIDs are Version "4" (random) and not Version "1" (MAC address).

    An advantage for using GUIDs in a .Net application is that the value can be generated anywhere within the layers. E.g., the .Net application can generate them and pass them to the database layer (as a parameter to the stored procedure). Or, if necessary, the database layer can generate them as needed. Lots of flexibility.

    Using an IDENTITY means that the database has to return the value to the caller each time. And if multiple rows are inserted, it gets cumbersome to return the list of (IDENTITY) values back to the application and for it to update the data that it has in memory.

    We also use XML as a method for passing "chunks" of data to/from the stored procedures vs. lots of scalar parameters. The .Net layer generates the unique (GUID) identifiers that are contained within the XML. Example: A Sales Order with line items for the merchandise being purchased. This would be a complete XML document that would be passed to the stored procedure which in turn would "shred" it into the proper relational tables.

    Impossible to do with IDENTITY.

    We use them extensively in the databases powering our SaaS offerings. In fact, all unique identifiers are GUIDs.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • GUID is also known as a UUID. See: http://en.wikipedia.org/wiki/Uuid which also describes the Versions and how to identify them.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Where possible (which is the vast majority of the cases even if you need uniqueness across multiple machines - just use a compound key) I recommend to my clients that they use some numeric and not a GUID for the PK if they want some surrogate PK. Datatype size is one very key feature, as is possible fragmentation if you (want to) cluster on it - and if you do cluster the key gets carried on every non-clustered index which can be another performance drain.

    Oh, and I never take the default seeding. Always explicitly start the sequence at the low limit of whatever numeric you choose. Doing otherwise can lose you half of your potential key values. I have seen numerous cases where key types had to be changed because they ran out of numbers - with half the values going unused.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • JohnG (7/16/2009)


    Impossible to do with IDENTITY.

    Not with the OUTPUT operator.


    N 56°04'39.16"
    E 12°55'05.25"

  • JohnG (7/16/2009)


    SQL Server GUIDs are Version "4" (random) and not Version "1" (MAC address).

    An advantage for using GUIDs in a .Net application is that the value can be generated anywhere within the layers. E.g., the .Net application can generate them and pass them to the database layer (as a parameter to the stored procedure). Or, if necessary, the database layer can generate them as needed. Lots of flexibility.

    Using an IDENTITY means that the database has to return the value to the caller each time. And if multiple rows are inserted, it gets cumbersome to return the list of (IDENTITY) values back to the application and for it to update the data that it has in memory.

    We also use XML as a method for passing "chunks" of data to/from the stored procedures vs. lots of scalar parameters. The .Net layer generates the unique (GUID) identifiers that are contained within the XML. Example: A Sales Order with line items for the merchandise being purchased. This would be a complete XML document that would be passed to the stored procedure which in turn would "shred" it into the proper relational tables.

    Impossible to do with IDENTITY.

    We use them extensively in the databases powering our SaaS offerings. In fact, all unique identifiers are GUIDs.

    I personally prefer a compound key for this situation - something like this

    HeaderTable

    PKID int identity

    various fields

    LineItemTable

    PKID

    Sequence

    various fields

    Sequence goes from 1-n for each sales header. Numerous benefits to this construct, including knowing the order of the line items which could be important for things like reproducing identical printouts.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • First of all do you really want your data to be globally unique? For example if you are trying to implement a merge replication that will transfer and receive data from/to different db servers, then probably makes sense to have GUIDs in your tables. If not then consider using INTEGERS as they are way efficient than GUIDs. Having GUIDs could also lead to heavy index fragmentation since they are random in nature. With this being said try using NEWSEQUENTIALID() function in SQL 2005 which is much efficient compared to GUID(), the only downside is you can't generate this using SELECT statement as you would do in case of NEWID(). You would need to use this function as a DEFAULT constraint in the table definition.

    Hope this helps.

    Thanks,

    Amol

    Amol Naik

  • AmolNaik, did you read the thread completely? I think everything you said had already been covered, most of it more than once. 🙂

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • The problem with a compound key where one of the fields is a ServerId is when you are running a disaster recovery site.

    You have to remember NOT to synchronise the tables that tell you what the ServerId means.

    It is not insurmountable but it is an easy mistake to make.

Viewing 15 posts - 16 through 30 (of 51 total)

You must be logged in to reply to this topic. Login to reply