• pnauta (5/28/2013)


    Database partitioning is not my first thought, and I think partitioning didn't work in SQL 2000 like we're used to now. I have cleaned out a SQL 2005 database around 350GB and much more important is that indexes should be present to support the delete.

    Database partitioning is high maintenance to set up and maintain, I have created and updated scripts to create these partitions, and we have yet to delete partitions.

    Negative aspects:

    - your indexes need to be partition aligned (say if you split partitions on month basis, the index needs to be split on datetime type)

    - if they're not, any move or remove may render your database blocked for extended periods of time during which data is moved and indexes are updated

    - you need to understand it thoroughly before accepting the challenge

    - applying that to an existing database, where the problem you want to solve is the speed op data movement, is not recommended. You can kill a script to delete records, you can not kill the data movement.

    I have a 4,3TB database which is partitioned by month. That's a valid case. I feel 350GB is not a valid case and not worthwhile. Start Profiler, run the result through tuning wizard and apply indexes. I did that, and while I could just about delete the daily turnover, now the script I used will delete history at a rate of 1 hour per day. So my script takes an hour a day to delete that same day 3 months ago. It should be possible.

    Partitioning works just fine in SQL Server 2000. Lookup "Partitioned Views" and you'll see.

    You're correct about the initial setup being a bit complex especially if you have an IDENTITY column in the table, but it's worth the effort even on a 350GB database especially when you consider things such as backup and restore times. Consider the following... will you ever update an audit table? If it's truly an audit table, the answer should be "NO". That means that you have (in this particular case) 6 months of data that won't ever change. Only the "current month" will change. If you use a "Partitioned View" of the table (unlike a "Partitioned Table"), you can store those older, static months in a separate database which has several advantages. First, the database can be set to the "SIMPLE" recovery mode so that there's no need to backup the log. That also decreases the log file backup time on the "main" database where the "current month" is stored. That also means that you only have to backup the separate database once per month when you add a new month's worth of data to it. It also means that a restore would take much less time. During a DR restore, you're not so much concerned with old log data as you are with getting the database back up and running so you can return to normal business. Properly indexing such a thing is a trivial matter.

    As a bit of a sidebar, remember that if the "current month" table has an IDENTiTY column, you won't be able to do a direct insert into the view. Again, that's trivial because you can insert directly into the Current Month table instead of trying to insert into the "Partitioned View". It means changing the target of the audit triggers that are currently in place but, again, a trivial change compared to the benefits.

    The code to do all of the above is relatively trivial and can be done as a "set it and forget it" scheduled job.

    So far as your comment on being able to stop deletes but not datamovement, I'd say that just doesn't matter because once you setup the system to maintain the partitioning, it's just not going to matter at all. You are, in fact, supposed to test such things before you implement them to ensure that they not only work correctly but also have the necessary error checking to prevent any data loss.

    So far as "350GB is not a valid case and not worthwhile", that's likely an incorrect assumption but does need to be evaluated especially when it comes to backup space and the amount of time to do a DR restore. If the largest database a company has is only 350GB, they're probably not setup with huge reserves of disk and tape backup space. Most companies just won't spend the money on it because, for example, buying and powering up 4TB of disk space and a relatively larger number of backup tapes doesn't make sense for a 350GB database because technology changes (no sense buying lots of hardware that will go out of date without ever being used). All of that usually makes such partitioning very worthwhile.

    To summarize, the investigation and planning to pull off partitioning in the manner that I've just recommended would only take a day or two to come up with a rock solid plan. The coding for the automated monthly partition "moves" is trivial. The coding to use the new partitioned view is also trivial because, if you plan it correctly, the view will be named the same as the original monolithic audit table and no front end changes will be required. Only the triggers that put the audit data into the "current month" table would need to be changed and that's also a trivial change.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)