How Bad are Bad Page Splits – The Rest of the Story

,

In Part 1 of this article we looked at a specific use case (probably exaggerated) that gave us an idea about how CPU and IO performance might be affected by a bad page split. We continue this analysis looking at what really happens to the data on the leaf pages of a clustered index with a single insert into an already filled page (fill factor = 100%).

Warning: the dialogue below is well into the geeky intellect containing both T-SQL code and numbers. Side effects might include a nap or grabbing another BEvERage.

Approach

In this analysis, I used the Sales.SalesOrderDetail table from AdventureWorks2017. One advantage of using this table is that the clustered index is based on SalesOrderID and SalesOrderDetailID where SalesOrderDetailID has the identity property and in this database starts at 1 and ends at 121,317 with no gaps (rowcount = 121,317). The full script for this analysis is attached and well documented. Portions of the script are shown below for easier reading. You may have different page numbers for your installation, but the analysis should be correct.

First, I made sure that the clustered index (PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID) completely filled all the index’s leaf pages for FillFactor = 100 with the following code:

    ALTER INDEX PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID 
        ON Sales.SalesOrderDetail REBUILD
        WITH(DATA_COMPRESSION = NONE, FILLFACTOR = 100)

I then backed the database up to ensure each time it was restored starting with a “fully-packed” Clustered Index on Sales.SalesOrderDetail. I wanted to find a leaf page with a row (not in the middle) that I could subsequently delete, then rebuild the index, then add that row back into Sales.SalesOrderDetail.

Lines 96 to 179 show my search for a row to delete. I arbitrarily picked the 3rd-row boundary at the root level and the intermediate level in the B-Tree descent. Then I selected a row about 2/3rds within the row set on that page (16778).

--first look at  B-Tree for Primary Key, Clustered Index one in the same
SELECT [name],indid,reserved,used,rowcnt
    FROM sys.sysindexes
    WHERE [Name] IN ('PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID');
GO
--indid = 1            CI always has an IndID = 1
-- note rowcnt; normally will be accurate if statistics are up to date
DECLARE @IndID    INT = 1
SELECT OBJECT_NAME(object_id), index_id, partition_id,allocated_page_page_id, is_iam_page, page_type
        ,page_type_desc, page_level,next_page_file_id,next_page_page_id,previous_page_file_id
        ,previous_page_page_id, is_page_compressed
    FROM sys.dm_db_database_page_allocations (DB_ID(), OBJECT_ID('Sales.SalesOrderDetail'), @IndID,1,'Detailed')
    WHERE page_level IS NOT NULL
    ORDER BY (page_level+page_type) DESC,allocated_page_page_id
GO
-- note order, IAM_Page, INDEX_Page (One (root, 16384) at Page_Level 2 
--        and 7 (intermediate) pages at Page_Level 1)
--        and 1237 leaf level pages at level 0 
-- IAM: Index Allocation Map (and that is about all we are going to talk about) 
--    if you want more info on IAM go to Paul Randals blog 
--        on https://www.sqlskills.com/blogs/paul/inside-the-storage-engine-iam-pages-iam-chains-and-allocation-units/
--    Note page_type_desc:    IAM_Page
--                            Index_Page
--                            Data_Page*
--                            *Only have Data_Page in Clustered Index/Heaps
--
-- Level 2 (root( at page 16384)), root always highest level; 
--        Leaf Level always Level 0
    --almost same results as above SELECT using
--DBCC IND('AdventureWorks2017','Sales.SalesOrderDetail',1)
--GO
--    So the query has 1237 data pages (level 0) dedicated to index data; 
--    thus 121,317/1237 = ~98.1 rows/data_page
--first page in index 1 (IAM Page)
/*
--DBCC TRACEON(3604)                 -- trace 3604 needed to get results to SSMS; not for TABLERESULTS
DBCC PAGE('AdventureWorks2017',1,6928,3) WITH TABLERESULTS        
--DBCC TRACEOFF(3604)
GO
--    One result set, lots of stuff, but nothing we need for this presentation
*/
--    So now look at the root page for the Clustered Index (PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID)
--    Level 2 in index 1 (Index Page, --16384), root level
DBCC PAGE('AdventureWorks2017',1,16384,3) WITH TABLERESULTS    
                         --note 2nd result set (intermediate level); 3rd row, Child PageID = 17032
                         --    also note only 7 partition boundaries in root node (i.e. page not full)
GO
--two result sets; for today we will ignore first result set (metadata)
--    mainly metadata in first result set; for this presentation only look at the 2nd result set.
--    look at 2nd result set; SalesOrderID (key), SalesOrderDetailID (key), ChildPageID.
--    Also note Row Size (at root and intermediate level = # of bytes for key(s) plus 7 (metadata); 
--                        leaf level = # of bytes for keys + 4 (non-clustered)
--So now try to find a row going down the third page at the root level (3rd row is arbitrary)
--Level 1 in index 1 (Index Page, --17032 (from DBCC Page 16884)); intermediate level
DBCC PAGE('AdventureWorks2017',1,17032,3) WITH TABLERESULTS        
GO
--look at 2nd result set, 3rd row, now lower key boundary (53016,45671) and upper boundary is now 
--    (< 45790). Number of rows in this page is 120
--this is still an intermediate level result; again now go down the third page shown from the intermediate level
--Level 1 in index 1 (Index Page, --16778 (leaf level); note all data/row here!
DBCC PAGE('AdventureWorks2017',1,16778,3) WITH TABLERESULTS        --
GO
--first row of data is SalesOrderID = 53016 with SalesOrderDetailID = 45671
--last row of data is SalesOrderID = 53062 with SalesOrderDetailID = 45789
--so there are 119 rows on this data page
--now delete the 79th row (2/3rds of the SalesOrderDetailSpan)
--SalesOrderID = 53049; SalesOrderDetailID = 45749 (line 1467)
--now go back and look at the page preceeding (16777) and the page after (16779)
DBCC PAGE('AdventureWorks2017',1,16777,3) WITH TABLERESULTS        --
--    first row SalesOrderDetailID = 45552; last row SalesOrderDetailID = 45670; 119 rows
GO
DBCC PAGE('AdventureWorks2017',1,16779,3) WITH TABLERESULTS        --
--    first row SalesOrderDetailID = 45790; last row SalesOrderDetailID = 45908; 119 rows
GO

Lines 188-195 is the code to find and save row with SalesOrderDetailID = 45749 (for later insertion back into Sales.SalesOrderDetail) and then delete that row.

--first save row then delete
IF OBJECT_ID(N'tempdb..#temp') IS NOT NULL DROP TABLE #temp;
SELECT *
    INTO #temp
    FROM Sales.SalesOrderDetail
    WHERE SalesOrderDetailID = 45749;
DELETE Sales.SalesOrderDetail
    WHERE SalesOrderDetailID = 45749;

The code in Lines 198-225 shows that the page where the row with SalesOrderDetailID = 45749 is now gone and there is one less row on that page.

--Now go back and climb down the B-Tree to look again at this page
DECLARE @IndID    INT = 1
SELECT OBJECT_NAME(object_id), index_id, partition_id,allocated_page_page_id, is_iam_page, page_type
        ,page_type_desc, page_level,next_page_file_id,next_page_page_id,previous_page_file_id
        ,previous_page_page_id, is_page_compressed
    FROM sys.dm_db_database_page_allocations (DB_ID(), OBJECT_ID('Sales.SalesOrderDetail'), @IndID,1,'Detailed')
    WHERE page_level IS NOT NULL
    ORDER BY (page_level+page_type) DESC,allocated_page_page_id
GO
--page 16384 is still root
DBCC PAGE('AdventureWorks2017',1,16384,3) WITH TABLERESULTS    
--OK, still looking for the page that used to have SalesOrderDetailID = 45749

--still 3rd page in result set; lower boundary: SalesOrderID = 53131, SalesOrderDetailID = 45955
--now look at level 1
DBCC PAGE('AdventureWorks2017',1,17032,3) WITH TABLERESULTS;    
--third row shows the page that used to have SalesOrderDetailID = 45749
DBCC PAGE('AdventureWorks2017',1,16778,3) WITH TABLERESULTS;
--looking through result set at line 1449 we find SalesOrderDetailID = 45748
--and the next row has SalesOrderDetailID = 45750 (line 1467)
--    with 118 rows on this page.
--so row is actually gone.

To ensure a bad page split we rebuilt the index using code displayed below (lines 227-229).

--now rebuild index to ensure data pages are fully packed
ALTER INDEX PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID ON Sales.SalesOrderDetail REBUILD
    WITH(ONLINE = ON, DATA_COMPRESSION = NONE, FILLFACTOR = 100)

Before doing the bad page split I wanted to recheck the clustered index to ensure the appropriate page was full (and not having the missing row) in Lines 232-258.

--traverse the b-tree again
DECLARE @IndID    INT = 1
SELECT OBJECT_NAME(object_id), index_id, partition_id,allocated_page_page_id, is_iam_page, page_type
        ,page_type_desc, page_level,next_page_file_id,next_page_page_id,previous_page_file_id
        ,previous_page_page_id, is_page_compressed
    FROM sys.dm_db_database_page_allocations (DB_ID(), OBJECT_ID('Sales.SalesOrderDetail'), @IndID,1,'Detailed')
    WHERE page_level IS NOT NULL
    ORDER BY (page_level+page_type) DESC,allocated_page_page_id
GO
--aha!  root page is now 30328
DBCC PAGE('AdventureWorks2017',1,30328,3) WITH TABLERESULTS    
--result set still had 1 pages at level 1
--looking for SalesOrderDetailID = 45749; it is in the third row where SalesOrderDetailID = 46036
--now look at level 1 where SalesOrderDetailID = 45749
-- again 3rd row beginning with SalesOrderID = 51751 and SalesOrderDetailID = 40425, page = 30592
DBCC PAGE('AdventureWorks2017',1,30592,3) WITH TABLERESULTS    
--looking page where SalesOrderDetailID = 45749, 10th row (page 29114) has lower boundary (52620,44739) 
--    and upper boundarary where SalesOrderDetailID < 45766
--now looking at leaf level
DBCC PAGE('AdventureWorks2017',1,29114,3) WITH TABLERESULTS    
--line 1719 has SalesOrderDetailID = 45748
--line 1737 has SalesOrderDetailID = 45750
--    with 120 rows on this page.

Every time I ran the rebuild above, the page numbers changed. This is because the rebuild causes a new root page in another extent that may (or may not) have differing pages already in use. The pages in the leaf level are also different because of a possible reordering of the data and a copy from the old leaf-level pages to the new leaf level pages. In this particular rebuild the new root page is 30328. What I found interesting in all these rebuilds was that the number of rows in the intermediate level (level 1) changed. In this instance there were 13 intermediate level pages (7 originally), but looking at the result set the row we are interested in is still in the third row (page 30592). Looking at that page we find line 1719 had SalesOrderDetailID = 45748 and the next row with SalesOrderDetailID = 45750 with 120 rows on that page.

Now let’s put the missing row back into Sales.SalesOrderDetail as shown in Lines 260-271:

--now for the page split; now put back in the row where SalesOrderDetailID = 45749
SET STATISTICS IO,TIME ON;
SET IDENTITY_INSERT Sales.SalesOrderDetail ON;
INSERT Sales.SalesOrderDetail ([SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber]
                            , [OrderQty], [ProductID], [SpecialOfferID], [UnitPrice]
                            , [UnitPriceDiscount], [rowguid], [ModifiedDate])
    SELECT [SalesOrderID], [SalesOrderDetailID], [CarrierTrackingNumber], [OrderQty]
            , [ProductID], [SpecialOfferID], [UnitPrice], [UnitPriceDiscount]
            , [rowguid], [ModifiedDate]
        FROM #temp;
SET IDENTITY_INSERT Sales.SalesOrderDetail OFF;
SET STATISTICS IO,TIME OFF;

Remember that when you turn IDENTITY_INSERT ON, you must turn it back off because if you don’t, nobody else can do another IDENTITY_INSERT.

Let’s now traverse the B-Tree again for PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID as shown in Lines 274-315:

--so now traverse the B-Tree again for the CI/PK
DECLARE @IndID    INT = 1
SELECT OBJECT_NAME(object_id), index_id, partition_id,allocated_page_page_id, is_iam_page, page_type
        ,page_type_desc, page_level,next_page_file_id,next_page_page_id,previous_page_file_id
        ,previous_page_page_id, is_page_compressed
    FROM sys.dm_db_database_page_allocations (DB_ID(), OBJECT_ID('Sales.SalesOrderDetail'), @IndID,1,'Detailed')
    WHERE page_level IS NOT NULL
    ORDER BY (page_level+page_type) DESC,allocated_page_page_id
GO
--aha!  root page is still 30328
DBCC PAGE('AdventureWorks2017',1,30328,3) WITH TABLERESULTS    
--now looking for SalesOrderDetailID = 45749; the pointer for it is in the 3nd row (page 30592, lower boundary = 44739)
--intermediate level (1)
DBCC PAGE('AdventureWorks2017',1,30592,3) WITH TABLERESULTS    
-- now pointer for 45749 is 11th row, page 30377 (note out of sequence page)
--so now look at pages 29114, 30377, and 29115
DBCC PAGE('AdventureWorks2017',1,29114,3) WITH TABLERESULTS    
--59 rows, lower boundary = 45656, upper boundary = 45714; note pointers lines 37,38 (below)
/**********************************************
ParentObject	Object						Field		VALUE
PAGE HEADER:	Page @0x000001B445AD2000	m_prevPage	(1:29113)
PAGE HEADER:	Page @0x000001B445AD2000	m_nextPage	(1:30377)
***********************************************/
DBCC PAGE('AdventureWorks2017',1,30377,3) WITH TABLERESULTS    
--61 rows, lower boundry = 45715, upper boundary = 45775; note pointers lines 37, 38
/***********************************************
ParentObject	Object	Field	VALUE
PAGE HEADER:	Page @0x000001B458C62000	m_prevPage	(1:29114)
PAGE HEADER:	Page @0x000001B458C62000	m_nextPage	(1:29115)
************************************************/
DBCC PAGE('AdventureWorks2017',1,29115,3) WITH TABLERESULTS    
--119 rows, lower boundry = 45776, upper boundary = 45894; note pointers lines 37, 38
/***********************************************
ParentObject	Object	Field	VALUE
PAGE HEADER:	Page @0x000001B458C62000	m_prevPage	(1:30377)
PAGE HEADER:	Page @0x000001B458C62000	m_nextPage	(1:29116)
************************************************/

The root page remains 30328 and looking for the inserted row it should lie on a new page (30377) not in sequence with the prior (29114) and following page (29115). Also, note page 29114 is only half full and the new page with the inserted row is also about half full. So not only do we have two pages that are only half full (more IO???), but also a page out of sequence (and in a new extent) (and also more IO??? for range seeks or scans) as shown in the figure below:

As geeky as this paper has been so far, it would only be more so if we go back and rebuild the index and see where the inserted row lies and how close we are to the first version we started with. So with lines 319-321:

--And now if we do one final rebuild now that the table is complete
ALTER INDEX PK_SalesOrderDetail_SalesOrderID_SalesOrderDetailID ON Sales.SalesOrderDetail REBUILD
    WITH(ONLINE = ON, DATA_COMPRESSION = NONE, FILLFACTOR = 100)
--so now traverse the B-Tree again for the CI/PK
DECLARE @IndID    INT = 1
SELECT OBJECT_NAME(object_id), index_id, partition_id,allocated_page_page_id, is_iam_page, page_type
        ,page_type_desc, page_level,next_page_file_id,next_page_page_id,previous_page_file_id
        ,previous_page_page_id, is_page_compressed
    FROM sys.dm_db_database_page_allocations (DB_ID(), OBJECT_ID('Sales.SalesOrderDetail'), @IndID,1,'Detailed')
    WHERE page_level IS NOT NULL
    ORDER BY (page_level+page_type) DESC,allocated_page_page_id
GO
--aha!  root page is now 29904
DBCC PAGE('AdventureWorks2017',1,29904,3) WITH TABLERESULTS    
--now looking for SalesOrderDetailID = 45749; the pointer for it is in the 2nd row (page 30260, lower boundary = 40507)

DBCC PAGE('AdventureWorks2017',1,30260,3) WITH TABLERESULTS    
-- now pointer for 45749 is 54th row, page 16835 (observe pages back in order by extent) 

DBCC PAGE('AdventureWorks2017',1,16835,3) WITH TABLERESULTS    
--row for 45749 is line 1251 and there are 119 rows on this page!
/***********************************************
ParentObject	Object	Field	VALUE
PAGE HEADER:	Page @0x000001B458C62000	m_prevPage	(1:16834)
PAGE HEADER:	Page @0x000001B458C62000	m_nextPage	(1:16836)
************************************************/

Summary

A few notes from this analysis.

  • The word of mouth (WOM) about bad page splits is true – half the data from the original page plus the new row is written into a new page (most likely not in sequence and not in the same extent).
    • In this example, we saw 119 rows in the page before the split and then 59 after the page split with 61 rows in the new page (out of sequence).
  • A new page not in the same extent is now part of the index (out of order)
    • When SQL Server is scanning and index or performing an index range seek, SQL Server has to scan/seek for a page out of order (external fragmentation).
    • Old index page and new index page after a Bad Page Split now have internal fragmentation.
    • Pointers were readjusted in the intermediate (and possible root) levels of the B-Tree.
    • Rebuilding the index will remove the bad page splits.

Caveats:

  • It may be easier to have this document in one screen and the script in SSMS with line numbers turned on in another screen to follow article dialogue.
  • Page numbers will change with every rebuild and every install. This may cause you difficulty trying this script on your machine, but overall results should still be valid.

Comments and questions are welcome.

 

Resources

Rate

4 (3)

Share

Share

Rate

4 (3)