Table partitioning best practice

  • I've inherited a couple of rather large databases from my ex-colleague when I join this company.

    Today, a developer reached out to me to inform me that in that databases there's a couple of large tables which has partition & they would want to partition ALL of the tables in the DB. Bear in mind this is a huge 1TB database.

    Upon some deep dive, I notice the partition function's date range are out of date, the last values is back in 2018. So all of the data after Dec 2018 are now kept in the last partition. Seems like this defeats the actual purpose of partitioning. I get back to the developer and mention my findings & inform them that to partition the whole database will take huge effort and resources thus he needs to discuss with his manager first.

    My question is this, in what situation will partitioning actually beneficial? My basic understanding is partitioning is good for data maintenance & performance if they frequently needs to query data based on the date ranges. Is there any other scenario where partitioning is beneficial? Appreciate if someone could advice in laymen term.

    Furthermore, if there's no housekeeping being done towards data then partitioning will still be meaningless right? As data keeps growing and the number of partitioning will need to be increase all the time.

  • https://www.brentozar.com/archive/2012/03/how-decide-if-should-use-table-partitioning/

    I'm quite hesitant to use partition because it requires maintance and good queries that support it. Maybe the odd partition swap.

    As Brent usually mentions, what is the developer trying to fix?

    You can also use partitioned views in some cases https://sqlsunday.com/2014/08/31/partitioned-views/

     

     

  • Partitioning, first and foremost, is a data management tool, not a performance tool. We must be up front and clear with this. If you partition any tables, let alone every table (that is without a doubt probably not needed) in order to not suffer negative impacts to performance (your performance will get very bad), you must be able to guarantee that every query will filter by the partitioning key. If you can't guarantee filtering on the partitioning key, you will see scans across partitions, and that is going to hurt performance, a lot.

    I would only partition if I had a lot of data, i could guarantee partition elimination, and, most importantly, I needed to regularly remove data, and/or move it around. Rolling dates is a good example. You only have to keep the previous twelve months of data, so once a month, you drop the oldest partition and create a new one.

    Partitioning for performance? Nope. Just almost never have I done that. It's frequently just too hard to guarantee the right kinds of queries.

     

     

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Jeff will no doubt add something here - do heed his advise.

     

    but as mentioned (and as you yourself raised as an issue) partitioning should only be used when most of the following are met

    • high volumes of data (100 Million or more is what I would consider as bare minimum)
    • there is a need to archive older data frequently
    • old data (partitions) can be moved to slower and/or read only filegroups
    • Majority of code uses the partition column to filter partitions
    • there is a need to do index rebuilds/stats and this can be done only on recent partitions (although this needs is subject to other discussions)

    in majority of cases that people decide on using partitioning is because A - they don't really understand how it works, and B their code is bad and they think partitioning will help performance (It WONT and it will most times make it worst)

  • Actually, partitioning came sometimes accidentally (or coincidentally) help performance.  For example, the data was originally clustered by id, but to partition you cluster it first by a date.  IF you happen to do the big majority of your lookups by date, then the new partitioning could indeed help performance.

    However, you could get the same performance gain simply by properly clustering the data without doing the partitioning at all.  Often, that makes it look as though "partitioning helped performance" when in fact it was really just the re-clustering that helped performance.

    The single most important factor for best overall performance is best clustering the table.  So focus initially on that, and ignore any phony rules such as "always cluster by id."  Clustering is far too important to be controlled by trite rule(s).  Instead, carefully select each table's clustering based on that table's actual needs and usage.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • As Joe said, quoting Brent, what problem are the developers trying to fix.

    The first thing in any troubleshooting situation is to articulate the problem in neutral terms. If it is described in terms of a potential solution then almost certainly the underlying problem is not understood and the wrong solution will be applied.

    Speak to your manager, speak to the developers, and speak to the developer's manager. The aim should be to get the problem defined, so that potential solutions can be proposed and a cost-effective plan 0ut together.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • All of the other folks have brought up great points on this thread and there's really nothing to add.  To emphasize, though...

    1. Partitioning does not improve query performance and will almost always (I say "almost" only because I don't want to say "always" alone, but I've never see it do otherwise) slow queries down.  The act of selecting a better Clustered Index to support the partitioning can improve performance even without the actual partitioning
    2. Partitioning every table will slow everything down.  As others have pointed out, there are reasons to partition a table and most "small tables" (people's definitions of "small", "large", and "huge" tables/databases vary quite a bit) simply don't qualify.
    3. For partitioning of large tables with legacy rows that will never change, consider the use of Read_Only FileGroups to prevent the need for having to continuously backup data that will never change.  Make sure to test "Piecemeal Restores", as well.  On big tables, they can be a life saver.
    4. Make sure that you understand that the partitioning column will be added to the to the Clustered Index if you partition it.  If it's a unique Clustered Index, the single column will become a 2 column index when you add the Clustering Key to it, meaning that the single column will no longer necessarily be unique anymore.  That could present issues with creating FKs.  You may have to create another separate unique index on the column that you want to be unique in order to use it as an FK.
    5. If you intend to use SWITCH IN or OUT, you need to make your all your non-clustered indexes are aligned.
    6. As another sidebar, if the table is truly a log table and all columns are being written to and you don't use any kind of automatic numbering or computed columns, Partitioned Views might also do the trick.  The really cool part about this in that the indexes don't need to be aligned and you can make the style of the indexes different on the older data than it is on the new data.

    The bottom line is that most (again... can't bring myself to say "ALL" and be able to guarantee it) partitioned tables are slower than a properly indexed monolith.  They can be great for index maintenance, backups, and restores but, like everyone here has been stressing, you have to know what problem you're trying to solve because they have a lot of gazintas and special requirements that frequently make them not worth it.  That's probably why the predecessor of the person on this thread did even try to maintain them for, apparently, the last 6 years.

    So, back to the original post... it partitioning going to actually be worth it?  Wouldn't it just be better to archive some old stuff by moving it to another database that doesn't require nightly full and transaction log backups?

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If you really want to see the differences between proper cluster/non-cluster indexing, and Partitioning data management processes, you need a test platform to check the performance indicators for alternative solutions.   As one person stated, there were design/implementation  flaws causing problems for many years.

    DBASupport

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply