Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Archiving Expand / Collapse
Author
Message
Posted Saturday, May 4, 2013 7:17 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Saturday, September 27, 2014 3:29 AM
Points: 108, Visits: 1,098
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?
Post #1449420
Posted Saturday, May 4, 2013 5:33 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 11:19 PM
Points: 812, Visits: 5,165
Big topic. You might want to start by looking up partitioning. Do you need to keep some of the records in the table "active"/editable while others are not? What kind of hardware are you working on? Single drive? multiple drives?
Post #1449478
Posted Saturday, May 4, 2013 5:38 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Tuesday, January 28, 2014 8:15 AM
Points: 3,065, Visits: 4,639
mah_j (5/4/2013)
I want to archive the data of a 400G table,what is the best way of archiving?


What's the reason behind "archiving" if as I understand it, all data would be online all the time?

Either way, I agree with the previous poster - this might be a case for table partitioning, range partition by date, perhaps one partition per year.


_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.
Post #1449479
Posted Sunday, May 5, 2013 11:00 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:18 PM
Points: 35,267, Visits: 31,759
PaulB-TheOneAndOnly (5/4/2013)
What's the reason behind "archiving" if as I understand it, all data would be online all the time?


4 words. "Backups", "Restores", and "index maintenance".

I'm going through this right now. We have a telephone system database where we're required to keep even several-years-old data and make it available online all the time. It's not a huge database (only 200GB) but the SLA to get it back online is much less that what a restore currently takes (yep, I test these things). The SLA also states (thanks to me pounding on people) that the system can be brought up with only 3 months worth of history within the current recovery SLA and that the rest of the data can be added in an almost leisurely fashion (I'll likely do it by month).

Table Partitioning (Enterprise Edition) would, of course, help a whole lot on index maintenance but won't do me any good for backups and restores because it requires that the partitions must be in the same database.

Sooooo... to make a much longer story shorter, I'm going to use similar archiving techniques to move data out of the "active" database and into an "archive" database a month at a time (1 table per month). In this case, "archive" simply means "not active" and "read only". Since it's in a different database (might be more than 1. 1 database per year, 1 table per month seems logical for backup and restore purposes) I'll use the ol' partitioned view technique to make it all seem like a single table and to make it so I don't actually have to change the apps that are pointing at the current table.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1449527
Posted Monday, May 6, 2013 12:32 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Today @ 2:56 PM
Points: 1,677, Visits: 4,785
mah_j (5/4/2013)
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?

Sometimes physically partitioning the rows into seperate tables or partitions makes sense. However, indexing is also a form of partitioning. This problem could be solved by effective clustering of rows in the table and indexing on transaction date.

Selecting a range of rows from a table filtered by clustered key is generally very fast, even with 100 million+ rows, unless the table is heavily fragmented. Check the level of fragmentation on the table. From what you're described, the table should be clustered based on a sequential ID or insert date/time for optimal querying and minimal fragmentation.

Also, experiment with a filtered index on whatever column is used for transaction date. You can even periodically drop / recreate this index using a different cutoff date.
For example:
create index ix_current on Sales_History ( Date_Of_Sale, Product_ID ) where Date_Of_Sale >= '2012/01';
Post #1449838
Posted Monday, May 6, 2013 2:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 11:18 PM
Points: 35,267, Visits: 31,759
Eric M Russell (5/6/2013)
mah_j (5/4/2013)
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?

Sometimes physically partitioning the rows into seperate tables or partitions makes sense. However, indexing is also a form of partitioning. This problem could be solved by effective clustering of rows in the table and indexing on transaction date.

Selecting a range of rows from a table filtered by clustered key is generally very fast, even with 100 million+ rows, unless the table is heavily fragmented. Check the level of fragmentation on the table. From what you're described, the table should be clustered based on a sequential ID or insert date/time for optimal querying and minimal fragmentation.

Also, experiment with a filtered index on whatever column is used for transaction date. You can even periodically drop / recreate this index using a different cutoff date.
For example:
create index ix_current on Sales_History ( Date_Of_Sale, Product_ID ) where Date_Of_Sale >= '2012/01';


To wit, the current clustered index on int,datetime should be reversed to be datetime,int and it should be UNIQUE, as well, to get rid of the 8 byte uniquifier that will be added without it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1449871
Posted Monday, May 6, 2013 2:50 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 5:57 AM
Points: 2,695, Visits: 3,395
Jeff Moden (5/6/2013)
Eric M Russell (5/6/2013)
mah_j (5/4/2013)
Hi


I want to archive the data of a 400G table,what is the best way of archiving?

My table has a clustered index on 2 fields (int,datetime).We just do select on last 2 years and insert in this table,and for the old data we do select seldom for examplejust 5 in a month,and no insert.

I want to create a table and insert the old data into it and create indexes for both tables,Is it a pointless act?

How can I archive the old data ?

Sometimes physically partitioning the rows into seperate tables or partitions makes sense. However, indexing is also a form of partitioning. This problem could be solved by effective clustering of rows in the table and indexing on transaction date.

Selecting a range of rows from a table filtered by clustered key is generally very fast, even with 100 million+ rows, unless the table is heavily fragmented. Check the level of fragmentation on the table. From what you're described, the table should be clustered based on a sequential ID or insert date/time for optimal querying and minimal fragmentation.

Also, experiment with a filtered index on whatever column is used for transaction date. You can even periodically drop / recreate this index using a different cutoff date.
For example:
create index ix_current on Sales_History ( Date_Of_Sale, Product_ID ) where Date_Of_Sale >= '2012/01';


To wit, the current clustered index on int,datetime should be reversed to be datetime,int and it should be UNIQUE, as well, to get rid of the 8 byte uniquifier that will be added without it.
And yet another addition... If you decide to move all of the data to its own database and make it read only, don't move all of the data at once. IT will make your log very very big very very fast Do it in batches that make sense for the size of your log. I always start with batches of 10,000 for these things and do it in a loop (yes, one of the times a loop is helpful). We actually have our archives set up in 100GB databases on one server where we simply create a new database at night and make it the "active" database and then make the previous read_only. I can explain more if you are interested in this, but I don't believe it is a good fit for most scenarios.


Thanks,

Jared
SQL Know-It-All

How to post data/code on a forum to get the best help - Jeff Moden
Post #1449889
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse