DML performance : Unpartioned Table with Index vs Partitioned table with Index

  • Hi

    My take on Partitioning is, it facilitates large chunks of data into/out of a main table, limiting its downtime, due to locking.  Other than that, its of no specific use. It may also facilitate occasional DBA maintenance activities like compression etc.  But, its of little use in terms of day to day DML operations, especially gauged against Indexing.

    For example, if a table is partitioned on date, the partition feature shines when the entire particular dated rows are to be deleted.  However, if within a given date, only certain specific rows are to be deleted, then, its actually not Partition policy, but the indexing mechanism is what shines.   In cases of selective deletes,  the performance is same both on a Partitioned table with Index and an Unpartitioned table with Index.   Is this correct?

    How are other variants of DML operations get influenced in specific scenarios by partition feature?

    many thanks

  • I may be an exception because of my limited use of partitioning (couple of serious tests for a couple of forum posts a long time ago and a 1TB call recording archive table) but I've yet to see real DML performance improvements that are directly attributable to partitioning itself.  Yes, I've see what people claim to be improvements but that seems to have always been due to the proper indexing required by partitioning rather than the partitioning itself.  If people had done the same proper indexing to the monolithic tables, they wouldn't have actually needed partitioning.

    In fact, despite what MS says in some of their documentation, my observations have been that partitioning actually slows down well written code with the right indexes because of the extra B-Tree work that partitioning has to do, not to mention a shedload of other caveats (heh... try adding a column to a partitioned table that has Read_Only partitions whether you use Partitioned VIEWs or Partitioned TABLEs (yeah... I drank the Purple Kool-Aid when it came to Partitioned TABLEs for my call recording archive... It should have been a Partitioned VIEW and I'm working on corrected that mistake).

    On the flip-side of the coin, partitioning of huge tables is a real life saver when it comes to backups (static partitions {1 file per filegroup for me } can be set to Read_Only so they never need to be backed up again and allow piece-meal restores to get "Back-in-Business" quickly in DR situations or partial disk failures) and seriously reduced index maintenance (same reason as backups).

    There is the thought that different indexes can be made on partitions because older static partitions can benefit from being indexed differently that the currently non-static partitions but, especially for Partitioned TABLEs, that would require non-aligned indexes, which disallows some of the functionality (like SWITCH) that may have been a justification to partition to begin with.

    My Bottom Line on Partitioning

    I'll also state than I'm NOT an exception when it comes to thoughts on "partitioning for performance"...

    1. If the "performance" you're looking for is for general DML, forget it.  It's actually going to do nothing there and will usually make it a little worse and force a shedload of caveats on you.
    2. If the "performance" is for loading or deleting of entire months of data, that might be a decent reason depending on other factors.  Again, think about the caveats.
    3. If the "performance" is to seriously reduce backups and index maintenance by ignoring data that will no longer change, that's usually one of the better reasons to partition, although you DO have to consider the multiple restrictions and caveats.
    4. If the "performance" is to make "getting-back-in-bussiness" quicker during a DR situation, partitioning can help and might be a good reason to partition but, again, there are a shedload of requirements and caveats.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • After tons and tons of testing with partitioning, I've completely tossed it as a performance enhancement tool. You can, in exceedingly rare circumstances, see actual performance enhancement through partition elimination and load distribution. However, this is very uncommon and requires a nearly flawless hardware configuration combined with a nearly perfect data distribution across partitions with precise statistics and code written in such a way that it works perfectly with all the rest. In short, for the overriding majority of people who have less than stellar hardware indifferently configured, bad data distribution, sloppy if not down right bad statistics, and frequently, oh, so very frequently, utterly crap code, this will not work.

    If you need partitioning for data management, great. It can save your bacon. On the other hand, you must be prepared for ensuring that your code is written such that it can do partition elimination. Not because this will improve performance. It will not. But because the partition elimination will ensure that the drop in performance you do get, is not as bad as it could be if you're scanning across partitions. If you can't guarantee partition elimination (which means very stringent coding requirements) don't do this.

    To directly answer your question, no, performance will not be the same in partitioned data when you're scanning the partition to eliminate specific rows, it will be worse.

    Sorry, Jeff has largely answered it, but I see so many of these "Hey partitioning isn't fixing my performance problem" questions that I can't help but pile on.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • We found modest performance gains with one of our tables following partitioning. This was before my time so it's anecdotal. Where we found significant gains was with batch loading tons of rows. Here's some details of the table:

    16,289,327,283 rows

    704 GB data

    685 GB index

    85 partitions, one per month, about 200M rows each

    Indexes are partition aligned.

    About 50 million rows loaded per week in one batch. 95% of those new rows belong in the "current" partition.

    SQL Server 2012, Enterprise edition, 768GB RAM but only one storage area serviced by one channel of one controller.

     

    It used to take about 8-10 hours to load those 50 million rows each Saturday. I changed the process a few months ago so it uses partition switching and it now takes about 1.5-2 hours.

    Partition switching also made it much easier to introduce data compression to the table.

    The most significant perf improvement followed the creation of an aligned columnstore index.

     

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Soooo.... it wasn't actually the partitioning but column store that made the improvement?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Soooo.... it wasn't actually the partitioning but column store that made the improvement?

    I wasn't around when the table was converted to partitioned but the folks who were, claimed that there was "some improvement".  There was "more improvement" when indexing was tweaked to fit commonest and hardest-working queries. Later still, the columnstore index improved some queries (which use data from hundreds of millions of rows) by up to 20x. So yes. The biggest improvement was from the columnstore index.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • That pretty much supports what I've found.  It's actually people finally paying attention to the data and doing some decent indexing that makes the code run faster.  Because they're finally paying attention because they trying to do the partitioning, they claim the performance is due to partitioning.  I've just not found partitioning to be beneficial even because of so called "partition elimination" when they actually have to change the code to get "partition elimination"... which would have actually been faster if they changed the code and index(es) the same way as they did for partitioning but left out the partitioning part.

    I built a demo many years ago that proved this during a food fight about partitioning on one of the forum posts.  If I ever get the time, I'll try to write an article about the misconception that partitioning is going to really help performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • And of course this is what you, Gail and Grant have been saying for years - except the fine detail of changing queries and indexing to work better with partitioning, which would have improved the original non-partitioned table to the same extent, or possibly better.

    If I ever get the darned server I've been looking for, I'm interested in writing up how to work with switching 'cos once you've got the hang of it, it's proper slick. You can disable indexes for a switched partition, do your changes, rebuild and switch back (essential with the readonly 2012 columnstore indexes).

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • If I'm reading you correctly, I totally agree.  SWITCH is a great reason to partition.  That's not the "performance" thing that most folks talk about and that I tend to rail against, though.  Heh... and you don't actually need TABLE partitioning to do the equivalent of SWITCH... and you don't need the other headaches (for me anyway) that TABLE partitioning has.  You can use PARTITIONED VIEWs (which have a huge number of advantages over PARTITIONED TABLES for me) but, as with all else in SQL Server, you get nothing for free... there's always caveats that will make one do major face-palms and the occasional but profound head-desk. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The biggest advantage to me of partitioning is being able to do separate compression, reorg, etc., on each partition.

    But, I too did have some cases where partition elimination has made a difference.

    Overall, though, I've seen much bigger performance gains from columnstore and even just regular (page) compression than from just partitioning.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Was it actually due to the partitioning itself or was it because of the preparations (and related indexes) that were actually the key?  I'm asking because in that thread I was talking about, proper indexing of the monolithic structure beat the pants off of the partitioned table even though partition elimination did come into play.

    And, no, not arguing.  Just explaining what I've seen before and have to ask the question.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply