partitioning a table

  • Hi Team ,

    I want to partition a vey huge table . It has approximately  2109955648 rows . I am planning to partition the tables based on day_id as this column is used to filter the data by all the queries .

    Day_id is nothing but the month in which we had performed the calculation . New day_id will added to the table every month or quarter .

    Please find the attached file which contains the No of rows in the table for each day_id . We have already splitted the database files into multiple drives .

    My question is how many partitions should i create ?

    Should i create one partition per year ?

     

     

     

     

     

     

     

     

     

     

     

    Attachments:
    You must be logged in to view attached files.
  • Sorry, I can't help you with this question, but I do admire the accuracy of your approximation!

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • My first bit of advice on this is to move all but the last 13 months of data to an "Archive" database for the current database.  Keep it in the SIMPLE recovery model and only back it up when you add a month to it.  Used page compression there and not in the main database if the table suffers from expansive updates.

    I'd also use the first of the month for dates rather than the end of the month but that's up to you and the people you have to work with.

    As I've done in the past, I'd create 1 partition per month.  It makes reindexing a lot less impactful especially if only the last couple of months suffer any form of fragmentation be it logical or physical.  It'll also make it easier to build the 13 month "roll off" because you can SWITCH out an entire month and more easily move it.

    Again, up to you but I make one filegroup per month and one file per month for the same reasons.

    Note that any form of partitioning is likely going to slow all but a small handful of queries down but it's well worth it because you won't be backing up or reindexing data that won't ever change after a month or two.  The fact that you'll only have 13 months online may help improve the performance of the queries that did slow down due to the partitioning.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • p.s.  You should also post the CREATE TABLE code for the table, its keys, and the indexes so we can see if you have possibilities going on such as "Trapped Short Rows" due to In-Row Lobs or lengthy VARCHAR/NVARCHAR columns.  If fragmentation is occurring on the Clustered Index, then it's also likely page splits are occurring and it's likely that's due to "Expansive Updates", which we might also be able to help you fix without changing any managed code, stored procedures, etc.  It may require you to change the order of the columns (people don't know about the problem with NULL VARCHAR/NVARCHAR columns going from requiring 0 space to suddenly requiring 2 bytes of space, which would also add to the problem of "Expansive Updates").

    Also, table partitioning has some issues that can be solved by using partitioned views instead and vice versa.  For example, if you have to do a full restore, you have to restore all partitions before you can do a backup.  That means that new data after a "get back in business" restore and then restoring the other, older, not-esessential-to-business partitions is paramount prior to doing a first backup after the restore.

    Both methods will allow you to set filegroups to READ_ONLY so you can do a single backup on the filegroups that contain data that never changes to seriously decrease your backup times without putting the data at risk by skipping the READ_ONLY file groups.

    And, to be sure, you're not going to get all the information you need to effectively and correctly plan your partitioning.  I'm just bringing some thing up to compel you to read and clearly understand the documentation (which is spread all over hell's half acre) and then compel you to do some serious testing to make sure you're not getting a serious dose of the proverbial "Purple Kool-Aid".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Are you partitioning the table solely to improve SELECT performance?

    How is the table clustered?  That is, what is the clustered index on the table now.  If the clus index starts with day_id, then you don't need to partition the table because of SELECTs.  If you're doing it for other table maintenance considerations, as noted by Jeff, then you may need to partition the table even if it is clustered first on day_id.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Hi All,

    I have updated the table create script and indexes on the table for reference.

    Attachments:
    You must be logged in to view attached files.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply