A Tale of Partition Schemes and Disappearing Dragons

  • Comments posted to this topic are about the item A Tale of Partition Schemes and Disappearing Dragons

  • Interesting tale of dragons.

  • I loved the implications of being "fired". 🙂

    Great article that explains one of the better reasons to partition a table and well done on saving yourself by keeping the old data until you absolutely know the nightly process completed successfully. It helps you "urn" 😛 your pay according to your boss.

    I do have a question, though. Do you find that the "nonclustered columnstore index" actually helps performance that much or could you actually do without it with a couple of tweaks to the other indexing on the table?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another approach is to use views to alternate source tables by updating the view at the end of the load procedure to point at the last updated table. What would be some pros/cons to using the written partition approach vs views?

  • nicolas_johnson 52979 (11/2/2015)


    Another approach is to use views to alternate source tables by updating the view at the end of the load procedure to point at the last updated table. What would be some pros/cons to using the written partition approach vs views?

    Simplest reasons: partitioned views are on the way out, and a partitioned table gives you fewer objects to manage.

    Gerald Britton, Pluralsight courses

  • Excellent post! It was informative and entertaining. Thank you!

  • g.britton (11/2/2015)


    nicolas_johnson 52979 (11/2/2015)


    Another approach is to use views to alternate source tables by updating the view at the end of the load procedure to point at the last updated table. What would be some pros/cons to using the written partition approach vs views?

    Simplest reasons: partitioned views are on the way out, and a partitioned table gives you fewer objects to manage.

    Ah... don't write off Partitioned Views just because of non-descriptive paragraph that represents a bloated Microsoft opinion to try to drive more sales to the more expensive Enterprise Edition. Partitioned views will be around as long as the Standard Edition is around and MS doesn't see fit to update it for Partitioned Tables.

    I'll also say that there are advantages to Partitioned View that many haven't had the need to consider. For example, I can index the current month partition in Partitioned Views differently than older months without having to worry about such things as whether or not an index is aligned or not, especially when the older months have had their file groups set to Read-Only. Also, as Kimberly Tripp has pointed out in her MCM video series, it's sometimes beneficial to have both types of partitioning associated with a "single" logical table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • We had a similar need to do a quick switch between a staging table and a production table, but simply used three table renames to accomplish this. The renames were very fast. I am not sure if there would be an advantage to using the partition switching technique instead.

  • Constant Coder (11/2/2015)


    We had a similar need to do a quick switch between a staging table and a production table, but simply used three table renames to accomplish this. The renames were very fast. I am not sure if there would be an advantage to using the partition switching technique instead.

    Good point. The table partition switching isn't necessary (and it's a bit of a PITA too boot) if you're replacing the whole table and your system can tolerate the incredibly brief period of the renames. It did that on an SQL based IVR system with 1.2 million customers and the renames are indeed fast and very effective.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • synonyms are also a nice tool, and you can avoid renames

    Gerald Britton, Pluralsight courses

  • entertaining and informative. I have used rename and partition switch. I think rename is good if its a one time load. Partition switch is something that can be created as a solution for production jobs. great article

    ----------------------------------------------------------------------------------------------------------------------------------------------------
    Roshan Joe

    Jeff Moden -Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • One of a very few times I found myself smiling while reading technical article. Good information. Thank you!

  • Happy dragons!. Nice article.

  • This was a nice demonstration of using partition switching to solve the problem of "loosing" the previous version of the data.  I suppose this solution would make a bit more sense if instead of doing a TRUNCATE they were only removing and rebuilding say the current month or year of data instead of truncating the entire table.

    I think the real thing this story highlights though is that data cleansing IS THE MOST IMPORTANT and often least respected part of building a data warehouse: "To make a long story short, it turns out one of our suppliers fed us an invalid date value, which we failed to handle correctly, causing the ETL to crash."  This kind of situation is usually handled by having an "Unknown" value in your dimension table, so that you always have something to link to your fact record, and then have some process, even if it is manual, to clean up the "Unknown" records later.

    Of course, if you want to improve the life expectancy of your dragons, don't take them North of the wall where they could be easily attacked by the white walkers.

  • To achieve a high availability of the Data Warehouse on functional level there is the concept of StagingArea (the kitchen / workarea to process the source data into Star Data models) and DataMart (the presentation layer). The DataMart is updated after processing the source data in the StagingArea, which will cause a limited unavailability of a Fact during the synchronization activity

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply