Can you create a partition on an existing Fact table?

  • Hi All,

    I have a Fact table that has been in use for 2 years now and I am getting to the point where the table is beginning to get big. It is currently sitting at roughly 40 mill records. I have BO reports that are built of this table and I have an ETL nightly scheduler that repopulates this table every night. (Along with other staging, dim, etc.)

    What I am looking to do is improve the performance of recreating this fact table. The fact table has about 6 years worth of data already in it (education data). What I want to do (because I THINK this would help in improving performance) is only delete the current records and then load in the current records each night leaving all historical data untouched in this fact table. (If it's not changing, why delete and reload every night?)

    When testing this process, I found that trying to delete 5 mill records from this table was taking me an hour to do so. After researching a little, I felt like maybe building partitions on this fact table would greatly help me achieve what I want to do. The fact table is already kind of broken into chunks with State Tests, Course Grades, Attendance, Incidents, etc.

    Based on this, can I create partitions on this already existing FACT table or would I have to rebuild an all new Fact table with partitions, then transfer over the data?

    Please let me know your opinions.

    Thanks for the help.

  • You can partition an existing table. Not by adding partitions to it, but by rebuilding the clustered index onto a partition scheme. This will take as long as rebuilding the clustered index usually takes, it won't be instant.

    Once partitioned, you can switch out a partition of data (swap a partition with an empty table) to do fast 'deletes'

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (5/14/2014)


    You can partition an existing table. Not by adding partitions to it, but by rebuilding the clustered index onto a partition scheme. This will take as long as rebuilding the clustered index usually takes, it won't be instant.

    Once partitioned, you can switch out a partition of data (swap a partition with an empty table) to do fast 'deletes'

    Thanks for the response GilaMonster. Would you by any chance know of any article that could help me through this process? I have read about creating the partition function and scheme.

    When rebuilding the clustered index, I can just delete it and rebuild it so it is pointing to the newly created partition scheme?

    As for switching out a partition of data (which sounds like that is what I am ultimately looking for) I dont know anything about this. Any info would be greatly appreciated.

    Thanks

  • Well, my dreams may have been crushed. I tried to create a Partition Function and when I hit the execute button I got the following message:

    Partition function can only be created in Enterprise edition of SQL Server. Only Enterprise edition of SQL Server supports partitioning.

    Is there any work around for this?

  • skaggs.andrew (5/14/2014)


    Thanks for the response GilaMonster. Would you by any chance know of any article that could help me through this process? I have read about creating the partition function and scheme.

    Start with Books Online, there's a lot in there

    When rebuilding the clustered index, I can just delete it and rebuild it so it is pointing to the newly created partition scheme?

    No, you don't want to do that. Just do a CREATE ... WITH DROP_EXISTING and specify the partition scheme where the filegroup would usually go

    As for switching out a partition of data (which sounds like that is what I am ultimately looking for) I dont know anything about this. Any info would be greatly appreciated.

    Again, Books Online has a lot on the subject.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Poor man's partitioning (though, this was the way we all used ot have to do it 😀 ) -> Partitioned Views. Details from msft here.

    Steve.

  • stevefromOZ (5/14/2014)


    Poor man's partitioning (though, this was the way we all used ot have to do it 😀 ) -> Partitioned Views. Details from msft here.

    Is this essentially physically creating smaller tables then union them all together in a view? What are the down sides to doing it this way vs having the capability of using partitioning?

    Would it be better to try and get the Enterprise Edition?

  • skaggs.andrew (5/14/2014)


    Hi All,

    I have a Fact table that has been in use for 2 years now and I am getting to the point where the table is beginning to get big. It is currently sitting at roughly 40 mill records. I have BO reports that are built of this table and I have an ETL nightly scheduler that repopulates this table every night. (Along with other staging, dim, etc.)

    What I am looking to do is improve the performance of recreating this fact table. The fact table has about 6 years worth of data already in it (education data). What I want to do (because I THINK this would help in improving performance) is only delete the current records and then load in the current records each night leaving all historical data untouched in this fact table. (If it's not changing, why delete and reload every night?)

    When testing this process, I found that trying to delete 5 mill records from this table was taking me an hour to do so. After researching a little, I felt like maybe building partitions on this fact table would greatly help me achieve what I want to do. The fact table is already kind of broken into chunks with State Tests, Course Grades, Attendance, Incidents, etc.

    Based on this, can I create partitions on this already existing FACT table or would I have to rebuild an all new Fact table with partitions, then transfer over the data?

    Please let me know your opinions.

    Thanks for the help.

    Yes. Have a look at "Partitioned Views" in Books Online. They have a different set of caveats but can be just as effective as "Partitioned Tables".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff. I will read up on this. Sounds like my only option right now and would be better than using just this stand alone table. Thanks all

  • It's not show stopping but the management is a bit of a PITA. You would be well served to create some management procs that do things like dropping and recreating the view as/when things change (ie not as easy as just partition switching in/out - you can't just take a referenced table out from under the view).

    Check out what your price would be for Enterprise, but if you're "off the street" without any pricing/discounts at your disposal, the price differential is going to be pretty large.

    Steve.

  • Thanks Steve. Last question. Let's say partitioning is not an option and I don't want to use the partition view, are there any other things I can do to the Fact table to help me out?

  • Is the table keyed or have other indexes? Have a surrogate key? If you have some of these, you might be able to get your deletes to run faster but (and it's a BIG BUT) this would also depend a lot on your set-up - as in, *why* are you having to rebuild/reload the entire table every load? The reasons behind this may limit what your options are. 40MM rows isn't 'huge' but loading it every day could get to be a pain.

    Steve.

  • As of now, there are several nonclustered indices placed on the fact table. (these were built to help improve the querying of the BO reports) We do have several surrogate keys in the table as well. We rebuild this fact table (along with other dim, staging, etc.) every night becuase the data in these source tables are updated on a daily basis. So in order to show the updated data in the BO reports, we do this.

    The entire process of deleting records, dropping indices, loading data, rebuilding indices takes roughly 3 1/2 hours each night. I would really like to try to cut this in half.

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply