Explanation of Page Splits

  • stevec883

    Old Hand

    Points: 362

    Q1 -- I have read quite a few articles on page splits and am looking for clarification. Am I correct in thinking that page splits happen because regular pages can't be added into the middle of a table? In other words, if I have a phone book and need to insert a lot of last names starting with "C", I need to split the pages in order to add pages in the middle (ie: because I am not adding to "Z's" which would be at the end, so simply add new pages).

    Q2 -- I realized I might have the incorrect notion, because I read about Fill Factors to create empty space on the Index pages. If I create an index and specify a Fill Factor of 60% causing 40% empty space, wouldn't this decrease the number of page splits later  when updates or inserts occur? I read an article that seemed to imply page splits would increase later when doing inserts. By creating empty space, this would allow updates to use that space instead of splitting(?).

  • Mr. Brian Gale

    SSC-Insane

    Points: 22954

    this page has a good writeup on page splits:

    https://blog.waynesheffield.com/wayne/archive/2016/10/smart-page-split/

    but if you think of it like a phone book, with each page in the phone book holding 8 KB of data, a page split occurs when data is added or changed on a page that causes it to go over the 8 KB.  So, in your example, inserting into the C's may not cause a page split or it might even though it is in the middle of the table.  If the C's page has 5 KB of data on that page and you are inserting 1 KB of data, it won't cause a page split.  If it has 7 KB of data and you are inserting 2 KB of data, you will get a page split.  If the row has 1 KB of data in it and the page has 8 KB of data in it and you update the data in the row to be 2 KB, page split.

    Paul Randal has a good writeup on good page splits and bad page splits too.  But basically, a good one is when it happens at the end of the index because you just put the data onto the end and no shuffling required.  Bad ones are when it happens in the middle of the index because you need to shuffle the whole table around.  LINK - https://www.sqlskills.com/blogs/paul/tracking-page-splits-using-the-transaction-log/

    Fill factor only comes into play when the index is NEW or REBUILT.  Basically, it is just telling SQL that the data is either changing frequently or the inserts happen at random locations in the index so save some space on the page for these changes.  Microsoft says "When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth. For example, specifying a fill-factor value of 80 means that 20 percent of each leaf-level page will be left empty, providing space for index expansion as data is added to the underlying table. The empty space is reserved between the index rows rather than at the end of the index." and I think this covers it fairly well. LINK - https://docs.microsoft.com/en-us/sql/relational-databases/indexes/specify-fill-factor-for-an-index?view=sql-server-ver15

    The TL;DR version:

    Page splits happen when a page is filled and a new page needs to be added to the index.  Doesn't matter where you are inserting the data; just as long as a new page needs to be added.

    Fill factor is used when creating or rebuilding an index to define how full the pages should be and to allow (or not allow) growth on a page. Page splits should generally decrease if you have a proper fill factor set, but it can result in unused space in the database.

    If you think of it like a notebook, if you write until the page is 50% full, you have half a page left for notes if you need to go back and make changes or additions.  But if you never need to make changes or additions, half of your notebook is unused.  On the other hand, if you fill up every page 100%, and you need to go back and make a change or addition, you have no room for additions or changes.  YOu would need to add a new page into the middle of the notebook.  Which is also a good analogy for page splits.  Page splits are when the page gets full and you need a new page.  It is much easier to add a page to the end of your notebook than it is to add one in the middle.

  • Jeff Moden

    SSC Guru

    Points: 996689

    Mr. Brian Gale wrote:

    Fill factor only comes into play when the index is NEW or REBUILT.  Basically, it is just telling SQL that the data is either changing frequently or the inserts happen at random locations in the index so save some space on the page for these changes.  Microsoft says "When an index is created or rebuilt, the fill-factor value determines the percentage of space on each leaf-level page to be filled with data, reserving the remainder on each page as free space for future growth. For example, specifying a fill-factor value of 80 means that 20 percent of each leaf-level page will be left empty, providing space for index expansion as data is added to the underlying table. The empty space is reserved between the index rows rather than at the end of the index." and I think this covers it fairly well. LINK - https://docs.microsoft.com/en-us/sql/relational-databases/indexes/specify-fill-factor-for-an-index?view=sql-server-ver15

    Just to be sure, that also means that if you do INSERTs on an "ever-increasing-keyed" index, the pages will try to fill to 100% and that also means that even setting the FILL FACTOR lower will do absolutely nothing to prevent page splits if you turn right around and update those new rows in an "ExpAnsive" fashion.  Worse yet, it'll cost you dearly in both disk space and the footprint in memory.

    You also have to be wicked smart about when and when not to use REORGANIZE.  REORGANIZE will not ADD pages like REBUILD does and so REORGANIZE usually ends up removing pages by compressing other pages UP TO the FILL FACTOR without making extra space above the Fill Factor and that ends up actually making both page splits and the ensuing fragmentation much worse.  It's like a bad drug habit, the more you use it, the more you need to use it.

    p.s.  Even supposed "good" page splits are kind of "bad" especially on "ever-incresing-keyed" indexes where they can create quite the hot spot of contention and the resulting blocking. 😀

    p.p.s.  And, also to be sure, bad page splits don't cause the "whole table to be shuffled around" during the split.  They only affect the data in the page being split and the new target page.  Those two pages and the one that was linked to the original also have to have their "doubly linked" references updated.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • stevec883

    Old Hand

    Points: 362

    Excellent thanks. This clarifies a lot and I look forward to those articles.

    It sounds like, after reading both posts (Jeff and Brian), that inserting data in the middle of the table causes splits (assuming it causes pages to go over 8k) and would create pages only applicable to those pages (and not the whole table). So in other words, if using the phone book analogy, if I were to insert a bunch of "C's" for last name and they are on page 4, then I would get (after 8k is exceeded)  page 4.1, page 4.2, and page 4.3, but pages 5, 6 and 7 on so forth would remain unchanged?

    • This reply was modified 1 month, 1 week ago by  stevec883.
  • Jeff Moden

    SSC Guru

    Points: 996689

    stevec883 wrote:

    Excellent thanks. This clarifies a lot and I look forward to those articles.

    It sounds like, after reading both posts (Jeff and Brian), that inserting data in the middle of the table causes splits (assuming it causes pages to go over 8k) and would create pages only applicable to those pages (and not the whole table). So in other words, if using the phone book analogy, if I were to insert a bunch of "C's" for last name and they are on page 4, then I would get (after 8k is exceeded)  page 4.1, page 4.2, and page 4.3, but pages 5, 6 and 7 on so forth would remain unchanged?

    "It Depends".  To continue that phone book example, if all the data on page 4 started with "Ca" and only went up to "Ch" and all the data you wanted to insert fell between those two, then what you described above is correct.  Specifically, the only change page 5 would see is the "Previous Page" number in its header and nothing else and certainly no page splits.  Pages 6 and 7 would not change at all.

    Pages 4.1, 4.2, and 4.3 may or (most likely) may NOT be in the same relative physical order in the MDF file as the logical order.  That's "Logical Fragmentation".  It's also not likely that pages 4 thru 4.3 would be filled to the max because they did split.  There's usually a pretty good chance (depending on the row width) that they'll be about 50% full... sometimes more... sometimes less... depending on the "cascade" of the data.  Sometimes, you can get multiple page splits just by inserting one long row and I'm only talking about the leaf level of the index.  The B-Tree pages may also split oddly or maybe not at all.

    The fact that the new pages after a :"bad" type of split will usually only be about 50% full is what some people refer to as "physical fragmentation".  I just refer to it ass "Page Density" to remove all doubt.

    Also be aware that some people have taken to calling these two types of fragmentation "Internal" and "External" fragmentation.  Personally, I think that's confusing as hell, especially for newbies.  When I say "Logical Fragmentation" and "Page Density", it seems like even newbies understand.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 996689

    @stevec883 and anyone else interested...

    I've extracted a section out of my "Black Arts" Index Maintenance #1 PowerPoint presentation and extracted it as a "slide show" so you don't actually need PowerPoint to run through it.  I think you find it (even as brief as it is) very informative as to how pages splits (good or bad) work.  Let me know what you think.

    Edit... Crud... it wouldn't let me upload it.  I'll try something else to upload it... perhaps a ZIP file.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden

    SSC Guru

    Points: 996689

    Ok... trying again as a ZIP file...

    Edit... looks like that did it.  Just double click on the link and an Explorer window should open up.  Double click on the .ppsx file (PowerPoint Slide Show file)  and the presentation snippet should work.  If it says it's starting PowerPoint, just hit cancel on that.  The presentation advances when you click on the screen or (depending on the type of mouse you have) roll the mouse wheel toward you.

    Attachments:
    You must be logged in to view attached files.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply