The Challenges of Splitting a Table

  • Comments posted to this topic are about the item The Challenges of Splitting a Table

  • I think the first step towards "vertical partitioning" is actually "proper normalization of a table".

    I also think that people that have the following 3 columns (as identified in the Reddit post you linked to) need to be shot...

    DeleteBy

    DeletedOn

    IsDeleted

    First of all, such rows don't belong in the main table... they should be squirreled away in an "archive" table somewhere.

    Second, it's not often that I'll use the word but it's stupid to have a separate "IsDeleted" column.  The presence of a date in the "DeletedOn" column should be all that you need.  If a date isn't present in that column, the row hasn't been deleted.

    I'll also state that a ModifiedBy column (which was also present in the linked post) is a huge source of "ExpAnsive Updates" that causes rampant page splits and the resulting heavy load on the log file and the resulting high levels of logical fragmentation.  There are multiple methods to handle this (the best way is with INT identifiers) but they do need to be handled especially when it comes to "Ever Increasing Keyed Indexes" where INSERTs are followed by UPDATEs that occur before the next index rebuild.  For tables that have existing data, INSERTs will always try to fill pages to 100% which also means that any expansive updates that occur will immediately and certainly cause page splits.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I've had to vertically split a dimension table on a data warehouse. It had over 500 columns, almost any update was causing a page split so splitting the table into two, one table with the most frequently used columns the other with less used columns, made a massive performance improvement.

  • Moving those frequent v rarely queried columns can make a huge difference. It's almost like a covering index for a variety of queries using those columns.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply