Primary Key Indexes

  • I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table.  The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table.  However after population the fragmentation on the primary key index is over 96 percent.  How can that be ?

  • paul.farnell - Wednesday, May 31, 2017 4:02 AM

    I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table.  The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table.  However after population the fragmentation on the primary key index is over 96 percent.  How can that be ?

    How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • paul.farnell - Wednesday, May 31, 2017 4:02 AM

    I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table.  The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table.  However after population the fragmentation on the primary key index is over 96 percent.  How can that be ?

    How big is the table, in pages?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Sean Lange - Wednesday, May 31, 2017 8:35 AM

    paul.farnell - Wednesday, May 31, 2017 4:02 AM

    I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table.  The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table.  However after population the fragmentation on the primary key index is over 96 percent.  How can that be ?

    How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...

    SQL Server 2016, 38000 rows.  Select is from a different database from 2 tables using an inner join.  1 of these tables carries a SQL identity column which is the field being inserted into my primary key on the destination table.  This primary key is the only index on the table.

  • GilaMonster - Wednesday, May 31, 2017 8:35 AM

    paul.farnell - Wednesday, May 31, 2017 4:02 AM

    I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table.  The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table.  However after population the fragmentation on the primary key index is over 96 percent.  How can that be ?

    How big is the table, in pages?

    2905

  • paul.farnell - Wednesday, May 31, 2017 8:38 AM

    Sean Lange - Wednesday, May 31, 2017 8:35 AM

    paul.farnell - Wednesday, May 31, 2017 4:02 AM

    I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table.  The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table.  However after population the fragmentation on the primary key index is over 96 percent.  How can that be ?

    How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...

    SQL Server 2016, 38000 rows.  Select is from a different database from 2 tables using an inner join.  1 of these tables carries a SQL identity column which is the field being inserted into my primary key on the destination table.  This primary key is the only index on the table.

    Row counts is helpful. How wide is this table? Again....seeing the destination table definition would help.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange - Wednesday, May 31, 2017 8:41 AM

    paul.farnell - Wednesday, May 31, 2017 8:38 AM

    Sean Lange - Wednesday, May 31, 2017 8:35 AM

    paul.farnell - Wednesday, May 31, 2017 4:02 AM
    CREATE TABLE [dbo].[Customers](
        [CustomersRID] [int] NOT NULL,
        [Company] [int] NOT NULL,
        [AccountNo] [varchar](10) NOT NULL,
        [GroupAccountNo] [varchar](10) NULL,
        [CustomerName] [varchar](50) NULL,
        [AccountStatus] [varchar](5) NULL,
        [CustomerGroup] [varchar](10) NULL,
        [Intrastat] [varchar](5) NULL,
        [SalesPerson] [int] NULL,
        [Postcode] [varchar](20) NULL,
        [Address1] [varchar](50) NULL,
        [Address2] [varchar](50) NULL,
        [Address3] [varchar](50) NULL,
        [Address4] [varchar](50) NULL,
        [Address5] [varchar](50) NULL,
        [Address6] [varchar](50) NULL,
        [PhoneNo] [varchar](30) NULL,
        [FaxNo] [varchar](30) NULL,
        [SalesContact] [varchar](50) NULL,
        [AnalysisCode3] [varchar](50) NULL,
        [CountryCode] [int] NULL,
        [Outlet] [tinyint] NULL,
        [ContractNo] [varchar](250) NULL,
        [PaymentTerms] [varchar](10) NULL,
        [MasterGroupAccount] [varchar](10) NULL,
        [Ledger] [varchar](10) NULL,
        [AccountCreationDate] [date] NULL,
        [HomeDepot] [tinyint] NULL,
        [SugarCrmCustomerID] [char](36) NULL,
        [LastInvoice] [date] NULL,
        [LastPaid] [date] NULL,
        [SettlementDiscountCode] [varchar](30) NULL,
        [AccountBalance] [money] NULL,
        [CreditLimit] [money] NULL,
        [DateInserted] [smalldatetime] NULL CONSTRAINT [DF_Customers_DateInserted] DEFAULT (getdate()),
        [TelesalesPerson] [varchar](25) NULL,
        [TradingGroup] [varchar](25) NULL CONSTRAINT [DF_Customers_RetailGroup] DEFAULT ('Not Assigned'),
        [RetailGroup] [varchar](1) NULL CONSTRAINT [DF_Customers_RetailGroup_1] DEFAULT ('N'),
        [Statement] [varchar](10) NULL,
        [PaymentMethod] [varchar](10) NULL,
        [ChangingCustomerStatus] [varchar](10) NULL CONSTRAINT [DF_Customers_ChangingCustomerStatus] DEFAULT ('Unchanged'),
        [CustomerPMR] [varchar](50) NULL CONSTRAINT [DF_Customers_CustomerPMR] DEFAULT ('Not Assigned'),
    CONSTRAINT [PK_Customers_1] PRIMARY KEY CLUSTERED
    (
        [CustomersRID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]

  • what is the fill factor of the index?
    are there any gaps in the identity data?
    are there any triggers or procedures that might be running and modifying data at the same time?

  • Not an explanation of your problem, but in our warehouse we tend to drop indexes then truncate/load then recreate the indexes. It is generally a lot quicker on low-spec servers and (I'm assuming here) the indexes should be pristine at the end.

  • Not an explanation of your problem, but in our warehouse we tend to drop indexes then truncate/load then recreate the indexes. It is generally a lot quicker on low-spec servers and (I'm assuming here) the indexes should be pristine at the end.

    Completely agree.  Drop the PK, load the table, add the PK back.  You might even consider dropping and recreating the entire table minus the PK, load the table, add the PK.

  • I know this is a 10 month old post but (and especially since SQL Server 2008), as with all else in SQL Server, I'll have to say "It Depends"...

    If you're required to use the FULL Recovery Model for such a "Replace Everything" load, then there's only a 17% performance advantage while there's a 200% disadvantage for disk/memory space usage and a 200% disadvantage for how much log file is used if you load the HEAP and then build the Clustered PK as compared to populating an empty table with the PK in place.  On a million row load of a table with an IDENTITY property column being preserved as an IDENTITY column with rows that have a row size of 1,037 bytes each, there's only a difference of 24 seconds.  Here's the run stats on the tests I've done.
    --===== Final size of the databases and test durations
         -- Full Recovery, IDENTITY Property Preserved
    LogicalName                    SizeMB      RunDuration  RecoveryModel
    ------------------------------ ----------- ------------ -------------
    PKPopulationTest_WithoutPK     2300        00:02:18:483 FULL
    PKPopulationTest_WithoutPK_log 2200                    
    PKPopulationTest_WithPK        1200        00:02:42:137 FULL
    PKPopulationTest_WithPK_log    1100                    

    Note that both test databases started out with both a 100MB MDF and 100MB LDF with a growth for each of 100MB.

    Things change quite a bit if there's no requirement to preserve the IDENTITY property of the Integer based Clustered PK.  Populating the empty table WITH the Clustered PK in place blows the doors off populating a HEAP and then adding the Clustered PK in all aspects.
    --===== Final size of the databases and test durations
         -- Full Recovery, IDENTITY Property Preserved
    LogicalName                    SizeMB      RunDuration  RecoveryModel
    ------------------------------ ----------- ------------ -------------
    PKPopulationTest_WithoutPK     2300        00:01:45:747 FULL
    PKPopulationTest_WithoutPK_log 2100                    
    PKPopulationTest_WithPK        1200        00:00:58:477 FULL
    PKPopulationTest_WithPK_log    1100                    

    Changing the first test (IDENTITY Property preserved) to the Bulk_Logged recovery produces faster but similar results in that the in-place Clustered PK method is slower than the HEAP load and build the Clustered CI method.  Again, the IDENTITY Property on the Clustered PK was preserved.
    --===== Final size of the databases and test durations
         -- Bulk_Logged Recovery, IDENTITY Property Preserved
    LogicalName                    SizeMB      RunDuration  RecoveryModel
    ------------------------------ ----------- ------------ -------------
    PKPopulationTest_WithoutPK     2300        00:01:01:340 BULK_LOGGED
    PKPopulationTest_WithoutPK_log 100                     
    PKPopulationTest_WithPK        1200        00:01:27:590 BULK_LOGGED
    PKPopulationTest_WithPK_log    100                     

    Since this is a full replacement of all the data in a table and it's for a data warehouse,  preserving the IDENTITY Property may not be necessary.  And, since it's a data warehouse, use of the Bulk_Logged Recovery Model (at least during the loads like this and with the understand of what any bulk logged operation can have on point-in-time restores),   If you can settle for that, then here's the type of performance you can get especially from the in-place Clustered PK method... comparatively NASTY FAST and low resource usage for the in-place PK method.
    --===== Final size of the databases and test durations
         -- Bulk_Logged Recovery, IDENTITY Property NOT Preserved
    LogicalName                    SizeMB      RunDuration  RecoveryModel
    ------------------------------ ----------- ------------ -------------
    PKPopulationTest_WithoutPK     2300        00:00:56:997 BULK_LOGGED
    PKPopulationTest_WithoutPK_log 100                     
    PKPopulationTest_WithPK        1200        00:00:37:507 BULK_LOGGED
    PKPopulationTest_WithPK_log    100                     

    As for the original question of where there was 96% Fragmentation after the load... it's generally not possible to have that much fragmentation on a "Full Replace" load like this.  All the code above used parallelism during the copy and the fragmentation was very near zero, so it's not parallelism that would have caused it.  This is especially true if the first key column is "ever increasing" but a well placed ORDER BY like that used in the original post will guarantee the correct order.

    About the only thing that I can think of that would cause the fragmentation would be an "after" trigger that cause an "expAnsive" update that cause rows to grow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply