Primary Key Indexes

Question

Primary Key Indexes

paul.farnell

SSC Eights!

Points: 806
More actions
May 31, 2017 at 4:02 am

#343033

I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286573 More actions · Answer 1

paul.farnell - Wednesday, May 31, 2017 4:02 AM
I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?

How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

paul.farnell - Wednesday, May 31, 2017 4:02 AM
I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?

How big is the table, in pages?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

paul.farnell SSC Eights! Points: 806 More actions · Answer 3

Sean Lange - Wednesday, May 31, 2017 8:35 AM
paul.farnell - Wednesday, May 31, 2017 4:02 AM
I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?
How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...

SQL Server 2016, 38000 rows. Select is from a different database from 2 tables using an inner join. 1 of these tables carries a SQL identity column which is the field being inserted into my primary key on the destination table. This primary key is the only index on the table.

paul.farnell SSC Eights! Points: 806 More actions · Answer 4

GilaMonster - Wednesday, May 31, 2017 8:35 AM
paul.farnell - Wednesday, May 31, 2017 4:02 AM
I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?
How big is the table, in pages?

2905

Sean Lange SSC Guru Points: 286573 More actions · Answer 5

paul.farnell - Wednesday, May 31, 2017 8:38 AM
Sean Lange - Wednesday, May 31, 2017 8:35 AM
paul.farnell - Wednesday, May 31, 2017 4:02 AM
I have a table in a database (it's a data warehouse) which is populated once a day by a truncate table and is then repopulated from another database. The table being populated has a primary key of an integer field which is the identity field from the source table. The insert statement has an order by statement as part of the code so the data is definitely going into the table in the order of the primary key of the destination table. However after population the fragmentation on the primary key index is over 96 percent. How can that be ?
How could we know? Could you provide some details about the environment? Things like the table structures and the insert statements, approximate row counts etc...
SQL Server 2016, 38000 rows. Select is from a different database from 2 tables using an inner join. 1 of these tables carries a SQL identity column which is the field being inserted into my primary key on the destination table. This primary key is the only index on the table.

Row counts is helpful. How wide is this table? Again....seeing the destination table definition would help.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

paul.farnell SSC Eights! Points: 806 More actions · Answer 6

Sean Lange - Wednesday, May 31, 2017 8:41 AM

paul.farnell - Wednesday, May 31, 2017 8:38 AM

Sean Lange - Wednesday, May 31, 2017 8:35 AM

paul.farnell - Wednesday, May 31, 2017 4:02 AM

CREATE TABLE [dbo].[Customers](
    [CustomersRID] [int] NOT NULL,
    [Company] [int] NOT NULL,
    [AccountNo] [varchar](10) NOT NULL,
    [GroupAccountNo] [varchar](10) NULL,
    [CustomerName] [varchar](50) NULL,
    [AccountStatus] [varchar](5) NULL,
    [CustomerGroup] [varchar](10) NULL,
    [Intrastat] [varchar](5) NULL,
    [SalesPerson] [int] NULL,
    [Postcode] [varchar](20) NULL,
    [Address1] [varchar](50) NULL,
    [Address2] [varchar](50) NULL,
    [Address3] [varchar](50) NULL,
    [Address4] [varchar](50) NULL,
    [Address5] [varchar](50) NULL,
    [Address6] [varchar](50) NULL,
    [PhoneNo] [varchar](30) NULL,
    [FaxNo] [varchar](30) NULL,
    [SalesContact] [varchar](50) NULL,
    [AnalysisCode3] [varchar](50) NULL,
    [CountryCode] [int] NULL,
    [Outlet] [tinyint] NULL,
    [ContractNo] [varchar](250) NULL,
    [PaymentTerms] [varchar](10) NULL,
    [MasterGroupAccount] [varchar](10) NULL,
    [Ledger] [varchar](10) NULL,
    [AccountCreationDate] [date] NULL,
    [HomeDepot] [tinyint] NULL,
    [SugarCrmCustomerID] [char](36) NULL,
    [LastInvoice] [date] NULL,
    [LastPaid] [date] NULL,
    [SettlementDiscountCode] [varchar](30) NULL,
    [AccountBalance] [money] NULL,
    [CreditLimit] [money] NULL,
    [DateInserted] [smalldatetime] NULL CONSTRAINT [DF_Customers_DateInserted] DEFAULT (getdate()),
    [TelesalesPerson] [varchar](25) NULL,
    [TradingGroup] [varchar](25) NULL CONSTRAINT [DF_Customers_RetailGroup] DEFAULT ('Not Assigned'),
    [RetailGroup] [varchar](1) NULL CONSTRAINT [DF_Customers_RetailGroup_1] DEFAULT ('N'),
    [Statement] [varchar](10) NULL,
    [PaymentMethod] [varchar](10) NULL,
    [ChangingCustomerStatus] [varchar](10) NULL CONSTRAINT [DF_Customers_ChangingCustomerStatus] DEFAULT ('Unchanged'),
    [CustomerPMR] [varchar](50) NULL CONSTRAINT [DF_Customers_CustomerPMR] DEFAULT ('Not Assigned'),
CONSTRAINT [PK_Customers_1] PRIMARY KEY CLUSTERED
(
    [CustomersRID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

tdroche Old Hand Points: 352 More actions · Answer 7

what is the fill factor of the index?
are there any gaps in the identity data?
are there any triggers or procedures that might be running and modifying data at the same time?

Chris Wooding SSCarpal Tunnel Points: 4530 More actions · Answer 8

Not an explanation of your problem, but in our warehouse we tend to drop indexes then truncate/load then recreate the indexes. It is generally a lot quicker on low-spec servers and (I'm assuming here) the indexes should be pristine at the end.

RonKyle SSC-Dedicated Points: 31537 More actions · Answer 9

Not an explanation of your problem, but in our warehouse we tend to drop indexes then truncate/load then recreate the indexes. It is generally a lot quicker on low-spec servers and (I'm assuming here) the indexes should be pristine at the end.

Completely agree. Drop the PK, load the table, add the PK back. You might even consider dropping and recreating the entire table minus the PK, load the table, add the PK.

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 10

I know this is a 10 month old post but (and especially since SQL Server 2008), as with all else in SQL Server, I'll have to say "It Depends"...

If you're required to use the FULL Recovery Model for such a "Replace Everything" load, then there's only a 17% performance advantage while there's a 200% disadvantage for disk/memory space usage and a 200% disadvantage for how much log file is used if you load the HEAP and then build the Clustered PK as compared to populating an empty table with the PK in place. On a million row load of a table with an IDENTITY property column being preserved as an IDENTITY column with rows that have a row size of 1,037 bytes each, there's only a difference of 24 seconds. Here's the run stats on the tests I've done.
--===== Final size of the databases and test durations -- Full Recovery, IDENTITY Property Preserved LogicalName SizeMB RunDuration RecoveryModel ------------------------------ ----------- ------------ ------------- PKPopulationTest_WithoutPK 2300 00:02:18:483 FULL PKPopulationTest_WithoutPK_log 2200 PKPopulationTest_WithPK 1200 00:02:42:137 FULL PKPopulationTest_WithPK_log 1100
Note that both test databases started out with both a 100MB MDF and 100MB LDF with a growth for each of 100MB.

Things change quite a bit if there's no requirement to preserve the IDENTITY property of the Integer based Clustered PK. Populating the empty table WITH the Clustered PK in place blows the doors off populating a HEAP and then adding the Clustered PK in all aspects.
--===== Final size of the databases and test durations      -- Full Recovery, IDENTITY Property Preserved LogicalName                    SizeMB      RunDuration RecoveryModel ------------------------------ ----------- ------------ ------------- PKPopulationTest_WithoutPK     2300        00:01:45:747 FULL PKPopulationTest_WithoutPK_log 2100                     PKPopulationTest_WithPK        1200        00:00:58:477 FULL PKPopulationTest_WithPK_log    1100
Changing the first test (IDENTITY Property preserved) to the Bulk_Logged recovery produces faster but similar results in that the in-place Clustered PK method is slower than the HEAP load and build the Clustered CI method. Again, the IDENTITY Property on the Clustered PK was preserved.
--===== Final size of the databases and test durations      -- Bulk_Logged Recovery, IDENTITY Property Preserved LogicalName                    SizeMB      RunDuration RecoveryModel ------------------------------ ----------- ------------ ------------- PKPopulationTest_WithoutPK     2300        00:01:01:340 BULK_LOGGED PKPopulationTest_WithoutPK_log 100                      PKPopulationTest_WithPK        1200        00:01:27:590 BULK_LOGGED PKPopulationTest_WithPK_log    100
Since this is a full replacement of all the data in a table and it's for a data warehouse, preserving the IDENTITY Property may not be necessary. And, since it's a data warehouse, use of the Bulk_Logged Recovery Model (at least during the loads like this and with the understand of what any bulk logged operation can have on point-in-time restores),   If you can settle for that, then here's the type of performance you can get especially from the in-place Clustered PK method... comparatively NASTY FAST and low resource usage for the in-place PK method.
--===== Final size of the databases and test durations      -- Bulk_Logged Recovery, IDENTITY Property NOT Preserved LogicalName                    SizeMB      RunDuration RecoveryModel ------------------------------ ----------- ------------ ------------- PKPopulationTest_WithoutPK     2300        00:00:56:997 BULK_LOGGED PKPopulationTest_WithoutPK_log 100                      PKPopulationTest_WithPK        1200        00:00:37:507 BULK_LOGGED PKPopulationTest_WithPK_log    100

As for the original question of where there was 96% Fragmentation after the load... it's generally not possible to have that much fragmentation on a "Full Replace" load like this. All the code above used parallelism during the copy and the fragmentation was very near zero, so it's not parallelism that would have caused it. This is especially true if the first key column is "ever increasing" but a well placed ORDER BY like that used in the original post will guarantee the correct order.

About the only thing that I can think of that would cause the fragmentation would be an "after" trigger that cause an "expAnsive" update that cause rows to grow.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)