Primary keys for an OLTP database

  • Comments posted to this topic are about the item Primary keys for an OLTP database

  • Thanks Evgeny. I didn't actually know there was a NEWSEQUENTIALID() function. I tend to use IDENTITY, however, there are times when a GUID would be "better"; so I'll keep that in mind.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thanks for the write up Evgeny!
    😎
    One thought on the test setup, the loop iteration and the execution overhead may skew the results.I've done similar tests and found that IDENTITY is the fastest and NEWID always the slowest. Here is my test harness

    USE TEEST;
    GO
    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL DROP TABLE dbo.TBL_TEST_NEWSEQUENTIALID;
    CREATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID
    (
      NSID UNIQUEIDENTIFIER NOT NULL CONSTRAINT DFLT_DBO_TBL_TEST_NEWSEQUENTIALID_NSID DEFAULT (NEWSEQUENTIALID())
                CONSTRAINT PK_DBO_TBL_TEST_NEWSEQUENTIALID_NSID PRIMARY KEY CLUSTERED
     ,INT_VAL INT NOT NULL
    );

    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL DROP TABLE dbo.TBL_TEST_NEWID;
    CREATE TABLE dbo.TBL_TEST_NEWID
    (
      NSID UNIQUEIDENTIFIER NOT NULL CONSTRAINT DFLT_DBO_TBL_TEST_NEWID_NSID DEFAULT (NEWID())
                CONSTRAINT PK_DBO_TBL_TEST_NEWID_NSID PRIMARY KEY CLUSTERED
     ,INT_VAL INT NOT NULL
    );

    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL DROP TABLE dbo.TBL_TEST_IDENTITY;
    CREATE TABLE dbo.TBL_TEST_IDENTITY
    (
      NSID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_IDENTITY_NSID PRIMARY KEY CLUSTERED
     ,INT_VAL INT NOT NULL
    );

    DECLARE @TIMER TABLE (T_TXT VARCHAR(50) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));
    DECLARE @SAMPLE_SIZE  INT = 1000000;

    ---------------------------------------------------------------------
    -- FIRST RUN
    ---------------------------------------------------------------------

    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 1');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWSEQUENTIALID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 1);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 1');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID;

    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 1');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 1);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 1');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWID;

    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 1');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_IDENTITY WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 1);
    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 1');
    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_IDENTITY;

    ---------------------------------------------------------------------
    -- SECOND RUN
    ---------------------------------------------------------------------

    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 2');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWSEQUENTIALID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 2);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 2');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID;

    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 2');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 2);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 2');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWID;

    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 2');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_IDENTITY WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 2);
    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 2');
    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_IDENTITY;

    ---------------------------------------------------------------------
    -- THIRD RUN
    ---------------------------------------------------------------------

    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 3');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWSEQUENTIALID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 0);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 3');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID;

    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 3');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 0);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 3');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWID;

    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 3');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_IDENTITY WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 0);
    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 3');
    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_IDENTITY;
    ---------------------------------------------------------------------
    -- TIMER RESULTS
    ---------------------------------------------------------------------
    SELECT
      T.T_TXT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
    FROM @TIMER T
    GROUP BY T.T_TXT
    ORDER BY DURATION ASC;

    Results on my i5 laptop

    T_TXT               DURATION
    ------------------- -----------
    IDENTITY 3              2452339
    IDENTITY 2              2491696
    IDENTITY 1              2881321
    NEWSEQUENTIALID 1       2924338
    NEWSEQUENTIALID 2       2945289
    NEWSEQUENTIALID 3       3414175
    NEWID 1                 4309828
    NEWID 2                 4363634
    NEWID 3                 6505897

    Looking at the sys.dm_db_index_physical_stats

    SELECT
      N'NEWSEQUENTIALID' AS METHOD_NAME
     ,IXPS.index_type_desc
     ,IXPS.alloc_unit_type_desc
     ,IXPS.index_depth
     ,IXPS.avg_fragmentation_in_percent
     ,IXPS.fragment_count
     ,IXPS.avg_fragment_size_in_pages
     ,IXPS.page_count 
    FROM sys.dm_db_index_physical_stats(DB_ID(N'TEEST'),OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID'),NULL,NULL,'DETAILED') IXPS
    WHERE IXPS.page_count > 0
    UNION ALL
    SELECT
      N'NEWID' AS METHOD_NAME
     ,IXPS.index_type_desc
     ,IXPS.alloc_unit_type_desc
     ,IXPS.index_depth
     ,IXPS.avg_fragmentation_in_percent
     ,IXPS.fragment_count
     ,IXPS.avg_fragment_size_in_pages
     ,IXPS.page_count 
    FROM sys.dm_db_index_physical_stats(DB_ID(N'TEEST'),OBJECT_ID(N'dbo.TBL_TEST_NEWID'),NULL,NULL,'DETAILED') IXPS
    WHERE IXPS.page_count > 0
    UNION ALL
    SELECT
      N'IDENTITY' AS METHOD_NAME
     ,IXPS.index_type_desc
     ,IXPS.alloc_unit_type_desc
     ,IXPS.index_depth
     ,IXPS.avg_fragmentation_in_percent
     ,IXPS.fragment_count
     ,IXPS.avg_fragment_size_in_pages
     ,IXPS.page_count 
    FROM sys.dm_db_index_physical_stats(DB_ID(N'TEEST'),OBJECT_ID(N'dbo.TBL_TEST_IDENTITY'),NULL,NULL,'DETAILED') IXPS
    WHERE IXPS.page_count > 0;

    The output

    METHOD_NAME     index_type_desc  alloc_unit_type_desc  index_depth avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages  page_count
    --------------- ---------------- --------------------- ----------- ---------------------------- --------------- -------------------------- -----------
    NEWSEQUENTIALID CLUSTERED INDEX  IN_ROW_DATA           3            0.641562064156206              24           149.375                    3585
    NEWSEQUENTIALID CLUSTERED INDEX  IN_ROW_DATA           3            0                              23             1                          23
    NEWSEQUENTIALID CLUSTERED INDEX  IN_ROW_DATA           3            0                               1             1                           1
    NEWID           CLUSTERED INDEX  IN_ROW_DATA           3           99.1434262948207              5020             1                        5020
    NEWID           CLUSTERED INDEX  IN_ROW_DATA           3           95.2380952380952                21             1                          21
    NEWID           CLUSTERED INDEX  IN_ROW_DATA           3            0                               1             1                           1
    IDENTITY        CLUSTERED INDEX  IN_ROW_DATA           3            0.333174678724417               8           262.625                    2101
    IDENTITY        CLUSTERED INDEX  IN_ROW_DATA           3            0                               7             1                           7
    IDENTITY        CLUSTERED INDEX  IN_ROW_DATA           3            0                               1             1                           1   

  • IDENTITY / SEQUENCE will always be the fastest because the keys are smallest and in sequential order.

    If you must use a GUID use NEWSEQUENTIALID() because the keys will at least continue to be in sequential order until the server restarts.
    At that point the sequence of GUIDS are not guaranteed to continue where they left off.:)

    NEWID is random... the best use for it is to shuffle a deck of cards.  If you use it, it will generate GUIDs all over the place, causing  page splits when the next GUID belongs inside of a clustered index on a full page.

    If you must use NEWID for some awful reason, as the table grows expect for increasingly worse performance.  Consider rebuilding your indexes and leaving plenty of free space... say 20% on each page for new inserts into it, and do this on a regular basis to make day to day work better.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

  • Thanks for the write-up Evgeny. Here's a couple of things we've found with our system related to PKs. They don't apply in every case so as usual, everything depends on the unique setup of your specific app, or in this case, of my specific app.

    • The size of the data can matter a great deal. If I'm reading your test setup correctly, your test harness ends up with 400k rows, which is not nothing, but it's also much smaller than many production tables. We've had some tables with over 1 billion rows in them. When tables are in the 100 million+ rows range and you're using guids as clustered PKs, page splits and fragmentation can become a much bigger problem. We recently went through a data re-architecture on some of our very big tables where we replaced guid PKs with INT PKs and in some cases we saw that inserts were going more than twice as fast without making any other changes.
    • Using identity for sequential keys is not a bad choice, but for some workloads sequences are much faster. The workload you tested was a large amount of independent single row transactions. In your test scenario, since every insert needs to get a lock on the identity so it can increment it, you can end up with lock contention on the identity itself which can slow you down. Some work loads are like that so it's a valid test, but there are other cases where you know ahead of time that you're going to be doing a lot of inserts. If you know you're going to be doing lot of inserts from your application, you can design your table with a bigint PK with a default of next value for [sequence] and then use the sp_sequence_get_range procedure to bulk get a block of IDs into the middle tier and assign them from there. When you do that, you can prevent the lock contention on the identity object because you're bulk getting IDs into the middle tier and making those servers do the work of dolling them out. Because of the default on the PK, you're also covered for cases when you only want to do a single insert and you don't have to bother pre-fetching the ID. We recently went through this exact exercise at my job and it works very well if your workload allows for it.
    • You can still get many of the benefits of the uniqueness of guids when using sequential IDs by using bigint and breaking it up into ranges. Let's use a power of 2 for an example and say that you allocate yourself 65,536 possible difference ID ranges just to be safe. Then you can start each ID with a different offset. So Order ID starts at 1 and increments by 65,536. Order Detail ID starts at 2 and increments by 65,536, etc. If you do that, you won't get collisions with your PKs and you'll still get the benefits of sequential IDs. Taking the max value of bigint and dividing by the number of IDs ranges, you still get over 140 trillion numbers in each range. Are you ever going to have that many rows in a table? It's not very likely, but if so you can always use 32,768 or even 16,384 ranges to get a larger number of identities in each range. 

    Thanks again for the write-up and keep contributing.
  • With integer keys, when you made a mistake with joins you may get a result, but it will be wrong. That is why database developers like uniqueidentifier keys.

     I am a database developer and I definitely do NOT like uniqueidentifier keys.  There is a place for them.  In a scenario where data has to be consolidated from multiple databases into one, they are probably unavoidable.  But otherwise they make everything more difficult.  If it's vital that unique keys be generated throughout the database, then use a single sequence source.  This, as you note, is only possible in SQL Server since the 2012 version. 

    Regarding the first part of your statement, don't make the joins wrong.  That's what testing is for.  The price paid by not using a more appropriate field is not worth preventing this.  And in any case, the use of  a single sequence source would prevent it.

     I deal routinely with a database with GUID keys for PKs.  It makes everything harder and slower.

  • mtassin - Thursday, December 27, 2018 7:15 AM

    IDENTITY / SEQUENCE will always be the fastest because the keys are smallest and in sequential order.

    If you must use a GUID use NEWSEQUENTIALID() because the keys will at least continue to be in sequential order until the server restarts.
    At that point the sequence of GUIDS are not guaranteed to continue where they left off.:)

    NEWID is random... the best use for it is to shuffle a deck of cards.  If you use it, it will generate GUIDs all over the place, causing  page splits when the next GUID belongs inside of a clustered index on a full page.

    If you must use NEWID for some awful reason, as the table grows expect for increasingly worse performance.  Consider rebuilding your indexes and leaving plenty of free space... say 20% on each page for new inserts into it, and do this on a regular basis to make day to day work better.

    While I appreciate the experiences that have made you come to such conclusions (and, to be clear, I used to believe in the same thing), most of that isn't true, although the recommendation for a proper FILL FACTOR and the use of REBUILDs for NEWID usage is absolutely spot on.

    It turns out that the larger (in terms of row count) a NEWID Clustered Index gets, the longer you can go with absolutely ZERO page splits and ZERO fragmentation.  Done properly and maintained correctly (and the current "Best Practices" for index maintenance are absolutely the worst thing to do for index maintenance, especially in this case).  I do have several demonstrable tests that exercise such indexes by adding 1,000 rows per simulated hour for 10 simulate hour per simulated day for 365 simulated days (3.65 Million rows per test and, yeah, it does take a while to run each test) and it demonstrates that you can actually go MONTHs with no page splits (not even good ones) and no fragmentation when it's done properly.  Oddly enough, it turns out the method is a fair bit faster than when using the "Append only" nature of sequential keys because even supposedly "good" page splits write more to the log file than ZERO pages splits do.

    Of course, there are disadvantages to the use of NEWID that we all know but performance of INSERTs, page splits, and fragmentation isn't a part of that (seriously contrary to popular belief).  Well... unless you use REORGANIZE as a part of your index maintenance, which removes free space when you need it the most and, very much like a really bad drug habit, the more you use it the more you need to use it because it actually perpetuates page splits and that perpetuates fragmentation and that perpetuates the need to defragment... over and over and over for the life of the table.

    I'm giving a 2 hour presentation on this and other types of indexes as well as why the current recommendations of supposed "Best Practice" index maintenance is actually causing more harm than good and it's the result of literally hundreds of hours of testing.  The next presentation will be at SQLSaturday in Cleveland on 2 Feb 2019.  If you're in the area, come see it.  I can almost guarantee you've never seen indexes or the harmful effects of current "Best Practice" index maintenance on indexes in the way that I present them.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • RonKyle - Thursday, December 27, 2018 8:04 AM

    With integer keys, when you made a mistake with joins you may get a result, but it will be wrong. That is why database developers like uniqueidentifier keys.

     I am a database developer and I definitely do NOT like uniqueidentifier keys.  There is a place for them.  In a scenario where data has to be consolidated from multiple databases into one, they are probably unavoidable.  But otherwise they make everything more difficult.  If it's vital that unique keys be generated throughout the database, then use a single sequence source.  This, as you note, is only possible in SQL Server since the 2012 version. 

    Regarding the first part of your statement, don't make the joins wrong.  That's what testing is for.  The price paid by not using a more appropriate field is not worth preventing this.  And in any case, the use of  a single sequence source would prevent it.

     I deal routinely with a database with GUID keys for PKs.  It makes everything harder and slower.

    Please don't mistake the following as me taking a swipe at you personally on this.  You just don't know the things I've discovered over the last 2 years of extreme study and testing.  To be sure, I was of the same opinion as you when I started my in-depth studies now more than two years ago on this subject.

    It turns out that random GUID keys (NEWID) for PKs will actually make just about everything easier and faster but, just like everything else having to do with SQL Server, you really have to know how.  It's not complicated but it does go against the nature of everything that you, me, and nearly everyone else in the world has been led to believe, especially when it comes to the maintenance of such indexes.

    Because of me saying that, you might think that I'm recommending the broad-stroke-of-the-pen use of GUIDs but, for some of the other reasons that folks have stated, I do not make such a recommendation.  I DO recommend that if the logical advantages of using GUIDs are essential, there should be no fear of them making things a lot harder and slow because, done correctly and maintained correctly (which is SUPER easy to do), they're actually easier to maintain and faster for inserts.  They can suck pretty badly if you do out-of-order batch-related processing so far as seek/range scans go and that may be a reason to avoid them for certain uses.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Thom A - Thursday, December 27, 2018 3:05 AM

    Thanks Evgeny. I didn't actually know there was a NEWSEQUENTIALID() function. I tend to use IDENTITY, however, there are times when a GUID would be "better"; so I'll keep that in mind.

    NEWSEQUENTIALID() sucks worse than IDENTITY.  It still creates a hotspot and is twice as large as even BIGINT.  If you ever actually need to us a GUID, hum a rock at me and I'll explain how to make it relatively painless and fast.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Insert only is one thing, things change drastically once updates, deletes and reinserts are introduced, without the knowledge of the data life-cycle patterns, it's all guesswork.
    😎

  • Please don't mistake the following as me taking a swipe at you personally on this. You just don't know the things I've discovered over the last 2 years of extreme study and testing. To be sure, I was of the same opinion as you when I started my in-depth studies now more than two years ago on this subject.

     I don't take comments personally provided they are not personally phrased.  But I have no shortage of experience on multiple systems either.  My primary point was that I am a database developer, and that, contrary to the point in the article, I do not like them when they are used for their own sake.  This is what the author seems to me to be saying.  I work with designs that use GUIDS for no apparent reason.  The original designers are long gone and I'm left to deal with the fallout.

  • Evgeny Garaev - Wednesday, December 26, 2018 11:46 PM

    Comments posted to this topic are about the item Primary keys for an OLTP database

    Thanks for this article, Evgeny.  It DOES take a while to put one like that together especially when it comes to proving it with demonstrable code.

    There's a lot of good in the article but there's a fair bit that still relies on existing knowledge and practices.  For example, you've not done any index maintenance during your data insertion loop to simulate real life and you've not created the correct FILL FACTOR for the "insert pattern" of each datatype.  As a result, you've come to the seriously mistaken conclusion that NEWID will cause the worst fragmentation when it can be made to cause ZERO fragmentation for months at a time and it actually gets better the more rows are added to the table.

    Again and as I've said to others in my responses to the posts on this thread, in no way am I taking a swipe at you personally.  It's just that we've all (including me in that before I started my in-depth studies) have been brainwashed by "Best Practices" and inadequate understanding of what's actually happening with indexes of all types and the index maintenance methods people are currently used to maintain them.

    The thing I really did like about your article (other than frequently being correct in many areas) is that you're another person that has actually started to question the status quo on the subject.  With that, I have to take my hat off to you and say "Thank you for your efforts". 

    If you're going to be anywhere near Cleveland on 2 Feb 2019, you should sign up to attend and come see my two sessions on this very subject.  You'll see that you're definitely on the right track and, I believe, you'll leave with a whole new understanding.

    And, yea... I'll be working on a series of articles for SSC now that I finally have some time freeing up.

    Thanks again for taking the time to write on this important subject.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • Eirikur Eiriksson - Thursday, December 27, 2018 4:25 AM

    Thanks for the write up Evgeny!
    😎
    One thought on the test setup, the loop iteration and the execution overhead may skew the results.I've done similar tests and found that IDENTITY is the fastest and NEWID always the slowest. Here is my test harness

    USE TEEST;
    GO
    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL DROP TABLE dbo.TBL_TEST_NEWSEQUENTIALID;
    CREATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID
    (
      NSID UNIQUEIDENTIFIER NOT NULL CONSTRAINT DFLT_DBO_TBL_TEST_NEWSEQUENTIALID_NSID DEFAULT (NEWSEQUENTIALID())
                CONSTRAINT PK_DBO_TBL_TEST_NEWSEQUENTIALID_NSID PRIMARY KEY CLUSTERED
     ,INT_VAL INT NOT NULL
    );

    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL DROP TABLE dbo.TBL_TEST_NEWID;
    CREATE TABLE dbo.TBL_TEST_NEWID
    (
      NSID UNIQUEIDENTIFIER NOT NULL CONSTRAINT DFLT_DBO_TBL_TEST_NEWID_NSID DEFAULT (NEWID())
                CONSTRAINT PK_DBO_TBL_TEST_NEWID_NSID PRIMARY KEY CLUSTERED
     ,INT_VAL INT NOT NULL
    );

    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL DROP TABLE dbo.TBL_TEST_IDENTITY;
    CREATE TABLE dbo.TBL_TEST_IDENTITY
    (
      NSID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_TEST_IDENTITY_NSID PRIMARY KEY CLUSTERED
     ,INT_VAL INT NOT NULL
    );

    DECLARE @TIMER TABLE (T_TXT VARCHAR(50) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT (SYSDATETIME()));
    DECLARE @SAMPLE_SIZE  INT = 1000000;

    ---------------------------------------------------------------------
    -- FIRST RUN
    ---------------------------------------------------------------------

    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 1');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWSEQUENTIALID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 1);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 1');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID;

    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 1');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 1);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 1');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWID;

    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 1');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_IDENTITY WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 1);
    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 1');
    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_IDENTITY;

    ---------------------------------------------------------------------
    -- SECOND RUN
    ---------------------------------------------------------------------

    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 2');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWSEQUENTIALID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 2);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 2');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID;

    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 2');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 2);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 2');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWID;

    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 2');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_IDENTITY WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 2);
    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 2');
    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_IDENTITY;

    ---------------------------------------------------------------------
    -- THIRD RUN
    ---------------------------------------------------------------------

    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 3');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWSEQUENTIALID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 0);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWSEQUENTIALID 3');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID') IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWSEQUENTIALID;

    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 3');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_NEWID WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 0);
    INSERT INTO @TIMER(T_TXT) VALUES('NEWID 3');
    IF OBJECT_ID(N'dbo.TBL_TEST_NEWID')     IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_NEWID;

    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 3');
    ;WITH T(N) AS (SELECT X.N FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) X(N))
    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY @@VERSION) AS N
         FROM T T1,T T2,T T3,T T4,T T5,T T6)
    INSERT INTO dbo.TBL_TEST_IDENTITY WITH (TABLOCKX) (INT_VAL)
    SELECT
      NM.N
    FROM  NUMS  NM
    OPTION (MAXDOP 0);
    INSERT INTO @TIMER(T_TXT) VALUES('IDENTITY 3');
    IF OBJECT_ID(N'dbo.TBL_TEST_IDENTITY')    IS NOT NULL TRUNCATE TABLE dbo.TBL_TEST_IDENTITY;
    ---------------------------------------------------------------------
    -- TIMER RESULTS
    ---------------------------------------------------------------------
    SELECT
      T.T_TXT
     ,DATEDIFF(MICROSECOND,MIN(T.T_TS),MAX(T.T_TS)) AS DURATION
    FROM @TIMER T
    GROUP BY T.T_TXT
    ORDER BY DURATION ASC;

    Results on my i5 laptop

    T_TXT               DURATION
    ------------------- -----------
    IDENTITY 3              2452339
    IDENTITY 2              2491696
    IDENTITY 1              2881321
    NEWSEQUENTIALID 1       2924338
    NEWSEQUENTIALID 2       2945289
    NEWSEQUENTIALID 3       3414175
    NEWID 1                 4309828
    NEWID 2                 4363634
    NEWID 3                 6505897

    Looking at the sys.dm_db_index_physical_stats

    SELECT
      N'NEWSEQUENTIALID' AS METHOD_NAME
     ,IXPS.index_type_desc
     ,IXPS.alloc_unit_type_desc
     ,IXPS.index_depth
     ,IXPS.avg_fragmentation_in_percent
     ,IXPS.fragment_count
     ,IXPS.avg_fragment_size_in_pages
     ,IXPS.page_count 
    FROM sys.dm_db_index_physical_stats(DB_ID(N'TEEST'),OBJECT_ID(N'dbo.TBL_TEST_NEWSEQUENTIALID'),NULL,NULL,'DETAILED') IXPS
    WHERE IXPS.page_count > 0
    UNION ALL
    SELECT
      N'NEWID' AS METHOD_NAME
     ,IXPS.index_type_desc
     ,IXPS.alloc_unit_type_desc
     ,IXPS.index_depth
     ,IXPS.avg_fragmentation_in_percent
     ,IXPS.fragment_count
     ,IXPS.avg_fragment_size_in_pages
     ,IXPS.page_count 
    FROM sys.dm_db_index_physical_stats(DB_ID(N'TEEST'),OBJECT_ID(N'dbo.TBL_TEST_NEWID'),NULL,NULL,'DETAILED') IXPS
    WHERE IXPS.page_count > 0
    UNION ALL
    SELECT
      N'IDENTITY' AS METHOD_NAME
     ,IXPS.index_type_desc
     ,IXPS.alloc_unit_type_desc
     ,IXPS.index_depth
     ,IXPS.avg_fragmentation_in_percent
     ,IXPS.fragment_count
     ,IXPS.avg_fragment_size_in_pages
     ,IXPS.page_count 
    FROM sys.dm_db_index_physical_stats(DB_ID(N'TEEST'),OBJECT_ID(N'dbo.TBL_TEST_IDENTITY'),NULL,NULL,'DETAILED') IXPS
    WHERE IXPS.page_count > 0;

    The output

    METHOD_NAME     index_type_desc  alloc_unit_type_desc  index_depth avg_fragmentation_in_percent fragment_count avg_fragment_size_in_pages  page_count
    --------------- ---------------- --------------------- ----------- ---------------------------- --------------- -------------------------- -----------
    NEWSEQUENTIALID CLUSTERED INDEX  IN_ROW_DATA           3            0.641562064156206              24           149.375                    3585
    NEWSEQUENTIALID CLUSTERED INDEX  IN_ROW_DATA           3            0                              23             1                          23
    NEWSEQUENTIALID CLUSTERED INDEX  IN_ROW_DATA           3            0                               1             1                           1
    NEWID           CLUSTERED INDEX  IN_ROW_DATA           3           99.1434262948207              5020             1                        5020
    NEWID           CLUSTERED INDEX  IN_ROW_DATA           3           95.2380952380952                21             1                          21
    NEWID           CLUSTERED INDEX  IN_ROW_DATA           3            0                               1             1                           1
    IDENTITY        CLUSTERED INDEX  IN_ROW_DATA           3            0.333174678724417               8           262.625                    2101
    IDENTITY        CLUSTERED INDEX  IN_ROW_DATA           3            0                               7             1                           7
    IDENTITY        CLUSTERED INDEX  IN_ROW_DATA           3            0                               1             1                           1   

    You may have differerent results if you insert rows in separated batch and sessions instead of one shot insert.

  • RonKyle - Thursday, December 27, 2018 9:18 AM

    Please don't mistake the following as me taking a swipe at you personally on this. You just don't know the things I've discovered over the last 2 years of extreme study and testing. To be sure, I was of the same opinion as you when I started my in-depth studies now more than two years ago on this subject.

     I don't take comments personally provided they are not personally phrased.  But I have no shortage of experience on multiple systems either.  My primary point was that I am a database developer, and that, contrary to the point in the article, I do not like them when they are used for their own sake.  This is what the author seems to me to be saying.  I work with designs that use GUIDS for no apparent reason.  The original designers are long gone and I'm left to deal with the fallout.

    Hi Ron,

    I'll say it again... I'm right there with you on the multiple suck factors that GUIDs bring to a database, especially when they are introduced for no reason (actually worked for a company where they used GUIDs for everything and it was horrible.  I'd love to exchange war stories with you there).  There's no question there.

    But, you did mention that they were slow and so let me ask... do you ever use REORGANIZE to maintain indexes that have a GUID as a leading column?  If so, then, experienced or not, you've fallen into the same trap that most all of us have (and I'm definitely including myself in that) even though we may have literally decades of experience with SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)
    Intro to Tally Tables and Functions

  • That is very unlikely that in 2018 you will be using SQL Server 2008, but I have decided to mention this fact.

    Don't be surprised to find shops that are still support SQL Server 2008 R2 and older instances of SQL Server.  That makes it a good thing to mention what isn't available in older versions.

Viewing 15 posts - 1 through 15 (of 55 total)

You must be logged in to reply to this topic. Login to reply