A Look at GUIDs

  • I like the article, but concering the newsequentialid ...

    can someone find the flaw in my testquery which proves they'r not sequential ???

    Declare

    @test-2 table (

    MyNewSeqId

    uniqueidentifier DEFAULT newsequentialid(),

    MyIdentity

    int not null identity(1,1),

    MyNewId

    uniqueidentifier DEFAULT newid(),

    Inserted

    datetime DEFAULT getdate()

    );

    set

    nocount on;

    declare

    @x integer;

    set

    @x = 0;

    while

    @x < 10000

    begin

    INSERT into @test-2 DEFAULT VALUES;

    set @x = @x + 1;

    end

    ;

    set

    nocount off;

    with

    cteMyGuidTest (MySeqNo, MyNewSeqId, MyIdentity, MyNewId, Inserted, MyNewSeqId_String3)

    as

    (

    SELECT

    row_number() OVER(ORDER BY substring(convert(char(36) , MyNewSeqId),1,3), MyNewSeqId ASC)

    ,

    MyNewSeqId, MyIdentity, MyNewId, Inserted

    ,

    substring(convert(char(36) , MyNewSeqId),1,3) as x

    FROM

    @test-2

    )

    select

    T1.MySeqNo, convert(varbinary(100),T1.MyNewSeqId) as MyNewSeqId1_Binary,T1.MyNewSeqId, T1.MyIdentity --, T1.MyNewId -- , T1.MyNewSeqId_String3

    ,

    T2.MySeqNo, convert(varbinary(100),T2.MyNewSeqId) as MyNewSeqId2_Binary, T2.MyNewSeqId, T2.MyIdentity --, T2.MyNewId --, T2.MyNewSeqId_String3

    from

    cteMyGuidTest T1

    inner

    join cteMyGuidTest T2

    on

    T1.MyNewSeqId_String3 = T2.MyNewSeqId_String3

    and

    T1.MySeqNo = T2.MySeqNo - 1

    order

    by convert(varbinary(100),T1.MyNewSeqId) , T1.MyNewSeqId, T2.MyNewSeqId;

    with

    cteMyGuidTest (MySeqNo, MyNewSeqId, MyIdentity, MyNewId, Inserted, MyNewSeqId_String3)

    as

    (

    SELECT

    row_number() OVER(ORDER BY substring(convert(char(36) , MyNewSeqId),1,3), MyNewSeqId ASC)

    ,

    MyNewSeqId, MyIdentity, MyNewId, Inserted

    ,

    substring(convert(char(36) , MyNewSeqId),1,3) as x

    FROM

    @test-2

    )

    select

    T1.MySeqNo, convert(varbinary(100),T1.MyNewSeqId) as MyNewSeqId1_Binary,T1.MyNewSeqId, T1.MyIdentity --, T1.MyNewId -- , T1.MyNewSeqId_String3

    ,

    T2.MySeqNo, convert(varbinary(100),T2.MyNewSeqId) as MyNewSeqId2_Binary, T2.MyNewSeqId, T2.MyIdentity --, T2.MyNewId --, T2.MyNewSeqId_String3

    from

    cteMyGuidTest T1

    inner

    join cteMyGuidTest T2

    on

    T1.MyNewSeqId_String3 = T2.MyNewSeqId_String3

    and

    T1.MySeqNo = T2.MySeqNo - 1

    order

    by T1.MyNewSeqId, T2.MyNewSeqId;

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • First, thank you to everyone who read and also posted a comment! I think I could probably frame it a bit better now, but I have an older article that starts to address the "why" at http://www.sqlservercentral.com/columnists/awarren/usinguniqueindentifierinsteadofidentity.asp.

    As to performance, it may seem sacrilegious, but not every decision you make has to be designed to get the absolute best performance. You have to look at the whole picture and the cost. Ten years ago, most people would have considered you out of your mind if you said you were going to use 3000 minutes on a cell phone each month, or turn off your land line for a cell phone. Since then the economics have changed. The same is true for the PC industry, CPU speed is fantasic, storage costs are down, storage access times are down, memory if not cheap at least affordable, and the ability to use more than 4-8g of memory is finally here. It's possible to bog down even the best hardware of course, but I'll argue that in most circumstances modern hardware masks any performance hit from using guids to the point that it rarely matters.

     

  • To clarify, sequential guids CAN be guessed easily compared to trying to do the same for the standard GUID. Thus the security risk if you expose them to users.

  • Why expose them to users - ever. I wouldnt inflict them on my worst enemy !!!! 🙂

  • Heck, I would!  Making my worst enemy type Guids all day would be awesome!

    Seriously, there is no reason for any user to ever even see a Guid, much less need to type one into an input box.


    Student of SQL and Golf, Master of Neither

  • I think that GUIDs are more valuable than most people give them credit for.  There is a debate about the performance impact but if used wisely to set up a good live archive system then performance in the primary production DB will improve.  I also like to use GUIDs on frequently updated records and rows.  Instead of doing record locking I update the GUID after every update...  my business layer has the ability to display the differences to the losing update.   

  • We use GUID's for an object relational mapping (ORM) layer in our framework. This allows clients to use the ORM for their own use (that we may or may not have knowledge of) to add tables / objects etc. We can confidently add to the ORM with out fear of a collision.

    Also GUID's are used (not as a primary key) to provide a unique reference enable collaboration with other software eg CRM, Finance systems. So far it has served us well.

    JS

  • We've used GUIDs a lot in one implementation. This was a time-tracking system and one of the things we implemented was in the Activities table, the PKey was a concatenation of the User's GUID Key + the Date + a SeqNo (ex. {00171A97-3D6A-436E-8536-A0788420686C}:01032006-0001). Mostly, we used GUIDs at the Parent Entity levels in the DB (Users, Companies, etc) and concatenations at the Child levels.

    I've never seen a performance hit from doing this even though the PKey is really a VarChar(70). I understood the risks of page splits, but I think in this case there MIGHT be less page splis because the Rows for each user would be lumped together on the same page(s) (as much as possible). This probably accounts for the reason that there really isn't a performance hit when selecting rows.

  • In this scenario, the GUID is not unique in the table so I would expect exactly the same result if you were using an integer or anything else as the user id - except for any saving from having a shorter index key.

    I assume that you mainly are concerned with searching by user - if by date then changing the order of the key components might show up in insert performance.

  • You might not be seeing a performance hit if your looking at events user by user - but it may get slower the longer you run this model (depending on activity rates?). Also, if using this generated key structure, store your dates as YYYYMMDD, then they'll be indexed by User and in date order (other wise you won't get all the events from one year together?!?)

    Surely you would be better storing this data in a binary column instead of a varchar column. It would take up 16 (GUID) + 4 (date) + 2 (seqno) = 22 bytes instead of the 50ish bytes taken up by the string. This allows for your required generated key (not questioning your choices of your primary key structure here - thats another topic!), just the data size. This would reduce your PK column size by half, improving index performance and size.

    If it's a hidden key, then it doesn't matter that it's binary column (ie not human readable), and it will be more efficient (slightly) to "decode" it as well.

    So you could generate the column data like this:

    DECLARE @UserUID uniqueidentifier

    DECLARE @Date datetime -- or char(8)

    DECLARE @SeqNo smallint

    DECLARE @GeneratedActivityKey binary(22)

    SET @UserUID = NEWID()

    SET @Date = GETDATE()

    SET @SeqNo = 22

    SET @GeneratedActivityKey = CONVERT(CONVERT(binary(16),@UserUID)+CONVERT(binary(4),@Date)+CONVERT(binary(2),@SeqNo))

    SELECT @GeneratedActivityKey

    will give something like

    0x00171A973D6A436E8536A0788420686C00012ABC0012

    Only taking up 22 bytes and still giving you your clustered pages (by user,date,seqno)

    It would probably be more efficient on selects to use a composite key however.

  • Yes, I've always had it in mind to re-think the key. I like your idea and I'll look into it. We don't use the keys for anything really (meaning no one tracks them). I had thought to move the date portion to YYYYMMDD but never got around to it. One thing I was also thinking of doing was to move the Clustered Key from this PKey to some other columns, but I could never really think of a good key for this.

    Thanks for the input!

  • Another caveat about using GUIDs as keys:

    Beware of using non-sequential GUIDs as a non-clustered PRIMARY KEY on large historical data. If you want to delete a block of data (say all events before '2006-01-01'), even if your clustered index is on your eventdate column, it will have to remove nodes from your non-clustered index for all rows affected by the DELETE statement. If your non-clustered index is on a "random" GUID, then this will hit a high percentage of pages on your index (with large data = very slow!!!). This occurs even if you batch your DELETE (with SET ROWCOUNT and a loop). If the GUID was sequential, then the nodes will be together and the DELETE from index operation would be much faster. If you use the COMB technique described elsewhere in this discussion (I think there was a link earlier) or a sequential key (int, bigint, sequentialguid etc), then this problem is somewhat alleviated.

    Or drop and recreate the index, but that could be expensive too!

  • I also received a question about whether both implementations guaranteed unique. I asked Steve to follow up with MS to get a 'good' answer, and Paul Randal from the storage team was kind enough to provide this answer:

    "These two intrinsics are thread-safe, and multiple concurrent users will get different UUIDs. http://blogs.msdn.com/sqlprogrammability/archive/2006/03/23/559061.aspx gives more detail on this topic."

  • The database that I have inherited is infested with GUIDs. I should probably at this point inform you that I am not a fan of GUIDs, never have been and probably never will be especially after having to work with this schema.

    The original reasoning behind choosing GUIDs as the clustered PKs was reasonably sound, to provide uniqueness to values that are generated at divers locations. However, due to the nature of the system with a high record insertion, the GUIDs are causing performance issues, locking up resources while page splits occur, blocking out waiting processes and ultimately resulting in timeouts.

    Sure we could increase the timeout duration, but, to be honest, the wait time involved is unacceptable in our particular system, and not just from our perspective but most especially from our customers who do not appreciate delays.

    You mention about hot-spots caused by using identity keys, well I've never yet had any issues with those, and I have been able to dramatically improve performance by eliminating the GUIDs in favour of identity. This on SQL 2005 as well.

    However, the article was definitely thought-provoking and I did in fact learn something new, which is always a good thing, and for those of you who like using GUIDs, all power to you, I'm just not about to be a convert.

    Mike

  • Hi Mike,

    Great post. Its good to have some real-world experiences on here. It kind of confirms what I was already thinking as well.

     

    Do you have any stats about this "dramatic improvement"?

     

    -Jamie

     

Viewing 15 posts - 16 through 30 (of 37 total)

You must be logged in to reply to this topic. Login to reply