Partition Help

  • Greetings,

    I'm trying to built a Partition function and scheme that will allow me to partition on the Month portion of a date column - e.g. if the date is 07.21.2016 I want to place those rows in a partition that contains all of July, regardless of the year (more on that in a second).

    The reason why I want to do this by month is to A) Control the total number of files and filegroups required (12 + 1), and B) we only load data into the tables that this would apply to at the end of our fiscal month.

    This:

    DATEPART(MONTH, SNAP_DATE)

    (where SNAP_DATE is the date column on the tables in question) will give me the numeric month, but I can't use DATEPART when I'm creating the table and assigning the partition scheme.

    Short of adding an additional column to the tables - computed or otherwise - to hold that value, is there another way to do this?

  • N.B. (7/30/2016)


    Greetings,

    I'm trying to built a Partition function and scheme that will allow me to partition on the Month portion of a date column - e.g. if the date is 07.21.2016 I want to place those rows in a partition that contains all of July, regardless of the year (more on that in a second).

    The reason why I want to do this by month is to A) Control the total number of files and filegroups required (12 + 1), and B) we only load data into the tables that this would apply to at the end of our fiscal month.

    This:

    DATEPART(MONTH, SNAP_DATE)

    (where SNAP_DATE is the date column on the tables in question) will give me the numeric month, but I can't use DATEPART when I'm creating the table and assigning the partition scheme.

    Short of adding an additional column to the tables - computed or otherwise - to hold that value, is there another way to do this?

    Don't partition, then. It's doing nothing for you. It's not going to help performance and it's not going to help with your night maintenance. All it's going to do is complicate things for you with absolutely no benefit.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/30/2016)

    Don't partition, then. It's doing nothing for you. It's not going to help performance and it's not going to help with your night maintenance. All it's going to do is complicate things for you with absolutely no benefit.

    How will it not help? These tables all together currently hold just over 6 Billion rows - two tables alone have 1.5 and 2.5 Billion each. We add several hundred million rows every month.

    The SNAP_DATE column in the tables will be part of the primary key on each table, and used when joining those tables to each other for historical reporting. Everything indicates that partitioning would help significantly by allowing the engine to skip to the relevant partition and have to only look at that portion of the clustered index.

  • N.B. (7/30/2016)


    The SNAP_DATE column in the tables will be part of the primary key on each table, and used when joining those tables to each other for historical reporting. Everything indicates that partitioning would help significantly by allowing the engine to skip to the relevant partition and have to only look at that portion of the clustered index.

    Except that, if the clustered index is on that date column already, then 'skipping non-relevant data' is exactly what an index seek does already. (and if the cluster isn't on that column, then putting it there will almost certainly get you the performance improvements you are expecting from partitioning, without having to partition.

    If you insist on partitioning, then adding another column is what you will need to do. It may be possible with a computed column, not sure, test and see. You'll also have to ensure that the queries filter on that column (as in, queries have <MonthNumberColumn name> = 7 in the WHERE clause), or you'll get no partition elimination at all and probably far worse performance than you have.

    Partitioning is for fast data loads/deleted (which won't be possible with your partitioning scheme), compression on older parts of the table (ditto) and index maintenance on just the changing parts of the table (slight benefit), not for performance gains.

    https://www.simple-talk.com/sql/database-administration/gail-shaws-sql-server-howlers/ and http://www.sqlservercentral.com/articles/Performance+and+Tuning/126532/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • N.B. (7/30/2016)


    Jeff Moden (7/30/2016)

    Don't partition, then. It's doing nothing for you. It's not going to help performance and it's not going to help with your night maintenance. All it's going to do is complicate things for you with absolutely no benefit.

    How will it not help? These tables all together currently hold just over 6 Billion rows - two tables alone have 1.5 and 2.5 Billion each. We add several hundred million rows every month.

    The SNAP_DATE column in the tables will be part of the primary key on each table, and used when joining those tables to each other for historical reporting. Everything indicates that partitioning would help significantly by allowing the engine to skip to the relevant partition and have to only look at that portion of the clustered index.

    You're not saving your data into partitions that are small enough. You're only saving by month... 12 partitions. You're also not setting up so that you could change older partitions to Read Only so that you can stop backing them up and stop doing index maintenance on them. And, partitioning does NOT help performance. It give the elusion that it does if you have crap code because of supposed "partition elimination" but, if you change the code to take advantage of "partition elimination" but do it on a monolithic table with good code, such code will almost always be faster on the monolithic table.

    Also, your primary keys will no longer be entirely valid because if you just change the date, you can have multiple copies of what used to be primary key data in the table because the addition of the partitioning column will now allow it.

    Partitioning is neither a panacea of performance (and frequently makes performance worse because of the multiple B-Trees it has to go through) nor a panacea of maintenance. If you don't carefully consider ALL of the ramifications (including things like aligned indexes and the proper partitioning key, which you haven't actually selected), then you'll end up with much slower code with much more maintenance and many more headaches. You could even make it much more difficult to do a DR restore, if you're not careful.

    I strongly recommend that you hit the books on partitioning, especially since you think that partitioning will somehow help performance.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply