SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Archiving


Archiving

Author
Message
mah_j
mah_j
SSC Veteran
SSC Veteran (273 reputation)SSC Veteran (273 reputation)SSC Veteran (273 reputation)SSC Veteran (273 reputation)SSC Veteran (273 reputation)SSC Veteran (273 reputation)SSC Veteran (273 reputation)SSC Veteran (273 reputation)

Group: General Forum Members
Points: 273 Visits: 1262
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?
pietlinden
pietlinden
SSCertifiable
SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)

Group: General Forum Members
Points: 5090 Visits: 13207
Big topic. You might want to start by looking up partitioning. Do you need to keep some of the records in the table "active"/editable while others are not? What kind of hardware are you working on? Single drive? multiple drives?
PaulB-TheOneAndOnly
PaulB-TheOneAndOnly
SSCertifiable
SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)SSCertifiable (5.6K reputation)

Group: General Forum Members
Points: 5567 Visits: 4639
mah_j (5/4/2013)
I want to archive the data of a 400G table,what is the best way of archiving?


What's the reason behind "archiving" if as I understand it, all data would be online all the time?

Either way, I agree with the previous poster - this might be a case for table partitioning, range partition by date, perhaps one partition per year.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)

Group: General Forum Members
Points: 89577 Visits: 41144
PaulB-TheOneAndOnly (5/4/2013)
What's the reason behind "archiving" if as I understand it, all data would be online all the time?


4 words. "Backups", "Restores", and "index maintenance".

I'm going through this right now. We have a telephone system database where we're required to keep even several-years-old data and make it available online all the time. It's not a huge database (only 200GB) but the SLA to get it back online is much less that what a restore currently takes (yep, I test these things). The SLA also states (thanks to me pounding on people) that the system can be brought up with only 3 months worth of history within the current recovery SLA and that the rest of the data can be added in an almost leisurely fashion (I'll likely do it by month).

Table Partitioning (Enterprise Edition) would, of course, help a whole lot on index maintenance but won't do me any good for backups and restores because it requires that the partitions must be in the same database.

Sooooo... to make a much longer story shorter, I'm going to use similar archiving techniques to move data out of the "active" database and into an "archive" database a month at a time (1 table per month). In this case, "archive" simply means "not active" and "read only". Since it's in a different database (might be more than 1. 1 database per year, 1 table per month seems logical for backup and restore purposes) I'll use the ol' partitioned view technique to make it all seem like a single table and to make it so I don't actually have to change the apps that are pointing at the current table.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Eric M Russell
Eric M Russell
SSChampion
SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)SSChampion (12K reputation)

Group: General Forum Members
Points: 12670 Visits: 10693
mah_j (5/4/2013)
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?

Sometimes physically partitioning the rows into seperate tables or partitions makes sense. However, indexing is also a form of partitioning. This problem could be solved by effective clustering of rows in the table and indexing on transaction date.

Selecting a range of rows from a table filtered by clustered key is generally very fast, even with 100 million+ rows, unless the table is heavily fragmented. Check the level of fragmentation on the table. From what you're described, the table should be clustered based on a sequential ID or insert date/time for optimal querying and minimal fragmentation.

Also, experiment with a filtered index on whatever column is used for transaction date. You can even periodically drop / recreate this index using a different cutoff date.
For example:
create index ix_current on Sales_History ( Date_Of_Sale, Product_ID ) where Date_Of_Sale >= '2012/01';


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)SSC Guru (89K reputation)

Group: General Forum Members
Points: 89577 Visits: 41144
Eric M Russell (5/6/2013)
mah_j (5/4/2013)
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?

Sometimes physically partitioning the rows into seperate tables or partitions makes sense. However, indexing is also a form of partitioning. This problem could be solved by effective clustering of rows in the table and indexing on transaction date.

Selecting a range of rows from a table filtered by clustered key is generally very fast, even with 100 million+ rows, unless the table is heavily fragmented. Check the level of fragmentation on the table. From what you're described, the table should be clustered based on a sequential ID or insert date/time for optimal querying and minimal fragmentation.

Also, experiment with a filtered index on whatever column is used for transaction date. You can even periodically drop / recreate this index using a different cutoff date.
For example:
create index ix_current on Sales_History ( Date_Of_Sale, Product_ID ) where Date_Of_Sale >= '2012/01';


To wit, the current clustered index on int,datetime should be reversed to be datetime,int and it should be UNIQUE, as well, to get rid of the 8 byte uniquifier that will be added without it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jared Karney
Jared Karney
SSCertifiable
SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)SSCertifiable (5.1K reputation)

Group: General Forum Members
Points: 5106 Visits: 3694
Jeff Moden (5/6/2013)
Eric M Russell (5/6/2013)
mah_j (5/4/2013)
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?

Sometimes physically partitioning the rows into seperate tables or partitions makes sense. However, indexing is also a form of partitioning. This problem could be solved by effective clustering of rows in the table and indexing on transaction date.

Selecting a range of rows from a table filtered by clustered key is generally very fast, even with 100 million+ rows, unless the table is heavily fragmented. Check the level of fragmentation on the table. From what you're described, the table should be clustered based on a sequential ID or insert date/time for optimal querying and minimal fragmentation.

Also, experiment with a filtered index on whatever column is used for transaction date. You can even periodically drop / recreate this index using a different cutoff date.
For example:
create index ix_current on Sales_History ( Date_Of_Sale, Product_ID ) where Date_Of_Sale >= '2012/01';


To wit, the current clustered index on int,datetime should be reversed to be datetime,int and it should be UNIQUE, as well, to get rid of the 8 byte uniquifier that will be added without it.
And yet another addition... If you decide to move all of the data to its own database and make it read only, don't move all of the data at once. IT will make your log very very big very very fast :-) Do it in batches that make sense for the size of your log. I always start with batches of 10,000 for these things and do it in a loop (yes, one of the times a loop is helpful). We actually have our archives set up in 100GB databases on one server where we simply create a new database at night and make it the "active" database and then make the previous read_only. I can explain more if you are interested in this, but I don't believe it is a good fit for most scenarios.

Thanks,
Jared
PFE - Microsoft
SQL Know-It-All
How to post data/code on a forum to get the best help - Jeff Moden
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search