Partitioning strategies for 18tb ++++ database,

  • My database has three tables that will be around 5-6tb, another 1tb in all other tables(12). All tables grow 550GB per month. The primary key is a GUID it is used to lookup the records.

    Content is continously added to all tables in the current month possibly last month due to content and date received.

    Table A: is 2tb there's a .001% chance of update within 60 days of being inserted, no other access is needed once inserted.

    Table B: is 4tb mainly read, there's a chance of update within 90 days - I'm obligated to show 3 years of data. The lookup is by a guid for the most recent 3 years of data, a date range is used, this will fetch 1 to nth number records over the three year period.(hundreds, possible thousands)

    Table C: is 1tb there's a .001% of update within 60 days of being inserted, no other access is needed at this time

    table: D is 1tb there's a .001% of update within 60 days of being inserted, no other access is needed at this time

    Thinking of using partitioning as follows. For each table a separate partitioning schema by month with separate file groups and files for each. Table D represents actually represents 12 tables; I'll probably put them in a separate FG over several files, unless one of them shows a need for partitioning.

    Would like to have feedback on my approach. Can’t image using the gui, so I’m planning on scripting everything. If anyone has a neat script here for generating the scripts I’d be interested in trying it.

    We have 36 months of history that we will start feeding in once implemented through an ETL process.

    1.Should we define partitions for the full 7 years + 1 month now and just create small file group / files for the future?

    2.To add file groups is simple, is it easy enough to add the next range of partiitons?

    3.After 36 months we need to keep the data available for an audit after 7 years we can drop. What strategy can best meet my needs?

    4.Best methods for backup and restore to remote location for 20TB +

    Considerations:

    •No extended outages, planned maintenance 1-2hrs once weekly or less

    •All data must be kept for 7 years.

    •No clustering to be used. (not my choice) I have concerns around patching.

    •We will locally mirror the data (once it’s loaded)

    •The database will then copy to a remote location. - This is still open looking at a possibly doing SAN replication.

    •The majority of the data is loaded through ETL 90% the rest through Biztalk.


    John Zacharkan

  • zach_john (4/28/2011)


    My database has three tables that will be around 5-6tb, another 1tb in all other tables(12). All tables grow 550GB per month. The primary key is a GUID it is used to lookup the records.

    Content is continously added to all tables in the current month possibly last month due to content and date received.

    Table A: is 2tb there's a .001% chance of update within 60 days of being inserted, no other access is needed once inserted.

    Table B: is 4tb mainly read, there's a chance of update within 90 days - I'm obligated to show 3 years of data. The lookup is by a guid for the most recent 3 years of data, a date range is used, this will fetch 1 to nth number records over the three year period.(hundreds, possible thousands)

    Table C: is 1tb there's a .001% of update within 60 days of being inserted, no other access is needed at this time

    table: D is 1tb there's a .001% of update within 60 days of being inserted, no other access is needed at this time

    Thinking of using partitioning as follows. For each table a separate partitioning schema by month with separate file groups and files for each. Table D represents actually represents 12 tables; I'll probably put them in a separate FG over several files, unless one of them shows a need for partitioning.

    Would like to have feedback on my approach. Can’t image using the gui, so I’m planning on scripting everything. If anyone has a neat script here for generating the scripts I’d be interested in trying it.

    We have 36 months of history that we will start feeding in once implemented through an ETL process.

    1.Should we define partitions for the full 7 years + 1 month now and just create small file group / files for the future?

    2.To add file groups is simple, is it easy enough to add the next range of partiitons?

    3.After 36 months we need to keep the data available for an audit after 7 years we can drop. What strategy can best meet my needs?

    4.Best methods for backup and restore to remote location for 20TB +

    Considerations:

    •No extended outages, planned maintenance 1-2hrs once weekly or less

    •All data must be kept for 7 years.

    •No clustering to be used. (not my choice) I have concerns around patching.

    •We will locally mirror the data (once it’s loaded)

    •The database will then copy to a remote location. - This is still open looking at a possibly doing SAN replication.

    •The majority of the data is loaded through ETL 90% the rest through Biztalk.

    As far as partitioning goes, I'd suggest partitioning by date with each partition being a month. Set up partitions for as far back as your data goes, don't worry about if it is more than 7 years. As you go forward it'll end up being far more than 7 years.

    Adding a new partition is easy, don't worry about that.

    Are you plannng on using archive tables? That's the best way to move the older data. Once the data is in the archive tables you can move it to different filegrops without worrying about your active tables.

    Are you planning on creating a single partition schema and basing all the tables partitions on that? That is what I'd suggest as they seem like they'd all be partitioned on the same logic.

    Remember that if you're going to partition on a date column, that column has to be part of your primary key. Make sure you propigate that to any tables where it is used as a foreign key. You'll also have to upate all your queries to join on both columns.

    If you want to actively use 36 months worth of data I'd normally suggest keeping 36 months in your primary tables with the rest of the 7 years in archive tables. However, since you say your tables grow at 550GB/Mo, that means you'll have almost 20 TB in your active tables. Figure out how far back your most used queries go and set up your main tables for that date period. Have a second set of tables that are actively accessed that are older than that date and go back to 36 months. Then use a third set of tables for the data between 36 months and 7 years.

    For the file groups, once the data's period is no longer being upated, it can go on other file groups. Then you don't have to back it up as often. I'd suggest still backing it up monthly as that's how often you'll be shifting data around. Bsae how many Files you want based on how often you'll be removing data and only having it as a backup file.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

Viewing 2 posts - 1 through 1 (of 1 total)

You must be logged in to reply to this topic. Login to reply