Archiving Millions of Image records

Question

Archiving Millions of Image records

swaffordjr

SSC Enthusiast

Points: 128
More actions
April 9, 2010 at 1:08 pm

#233658

We have a "zero" down time policy, yea i know, it's crazy, which I am attempting to conform to. Essentially, they never believed storing screen scrapes for Audit activity would become a problem. So now there is half a billion rows in an activity table and over a terabyte in the table storing images.
My task, whether I want it or not, is to maintain 6 months of data in the live system and archive out the excess. I've written a variety of stored procedures to tackle the task from various avenues, and now would like your opinion.
Plan: restore a week old back up for production to archive. Remove all records 6 months or newer from archive. (no production impact at this point)
Use sql="Delete From <Prod> WHERE EXISTS
(select top 1 <archive>.id from <archive> WHERE <archive>.id = <Prod>.id)"
This will cycle through all records in the prod table (50,000 in dev) and check them against archive, deleting the 14,000 which exist in archive.
Unfortunetly it taks 42 seconds for this operation in or sandbox environment and our production environment has 180 Million records to check. This procedure could cause resource issues on the prod server as well... not a great idea in my opinion.
So another option is to use cursor to collect a list of id in archive and attempt to delete a record that contains that id from production. I could run it in batchs and it will only be deleting one record at a time which will prevent locking issues.
This also will take a long time.
Question, if you delete a record with a column of type Image, will the log file create a copy of the image? I'm looking to delete 800GIG of pictures from prod and dont necessarily want to create 800GIG of logs.
I have no issue with writing the tsql to implement pretty much any plan, i'm just not sure the plan I should chase.
Another plan is to insert the 300GIG of "current" records into a mirror of production and then swap the db names. I've been "discouraged" by management from this alternative.
Help would be appreciated.

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Another option would be to copy the images you need into a parallel table, then swap the table names and drop the old table. Use a backup of the database, from before that, as your archive. If there are any inserts into the old table between when you create the new table and when you're ready to swap the table names, you can always back-fill those into the new table once the names are swapped and before you drop the old one.

You may not need to delete newer files from the archive. After all, they'll just end up being moved into there in 6 months anyway. Might save yourself one step that way.

If that idea won't work, then the cursor idea is probably what you need to do.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

swaffordjr SSC Enthusiast Points: 128 More actions · Answer 2

swaffordjr

SSC Enthusiast

Points: 128

April 9, 2010 at 2:24 pm

#1149365

Thanks GSquared.

Garadin One Orange Chip Points: 29613 More actions · Answer 3

You might also implement something using SET ROWCOUNT 50 or SET ROWCOUNT 100.

Should be similar to the cursor in that it wouldn't cause long term locking but doing it in small batches of 50 rather than small batches of 1 might be significantly faster.

Perhaps you were already talking about doing that, couldn't tell for sure based on your wording.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jeff Moden SSC Guru Points: 1004664 More actions · Answer 4

GSquared (4/9/2010)
Another option would be to copy the images you need into a parallel table, then swap the table names and drop the old table. Use a backup of the database, from before that, as your archive. If there are any inserts into the old table between when you create the new table and when you're ready to swap the table names, you can always back-fill those into the new table once the names are swapped and before you drop the old one.

That's the way I'd do it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 5

Garadin (4/9/2010)
You might also implement something using SET ROWCOUNT 50 or SET ROWCOUNT 100.

If forced to do it this way, I would use DELETE TOP (n), rather than SET ROWCOUNT, since its use in data modification statements is deprecated.

See Lynn Pettis' article for implementation details:

http://www.sqlservercentral.com/articles/67898/

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

Paul White SSC Guru Points: 150468 More actions · Answer 6

GSquared (4/9/2010)
Another option would be to copy the images you need into a parallel table, then swap the table names and drop the old table...

To minimize logging (which will be substantially larger than the size of the data!), be sure to use bulk copy to export the 'rows to keep' to a file, and then bulk load the data from that file into the new table.

Be sure to meet the Prerequisites for Minimal Logging in Bulk Import.

Export the data in clustered index sorted order, and load into an empty table with just the cluster defined, specifying the ordered hint. Create any non-clustered indexes after loading the data, unless you plan on using trace flag 610.

If you are using Enterprise Edition, ensure that the new table is partitioned, to make future archiving more efficient. If you do not have Enterprise Edition, consider using a local partitioned view instead - that will allow you to quickly drop one 'partition' (table) of data using TRUNCATE.

Essentially, the important thing here is to avoid fully-logged operations - either when copying or deleting.

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi