Copy varchar(max) fields slow

Question

Copy varchar(max) fields slow

terry999

SSCertifiable

Points: 5036
More actions
March 13, 2015 at 8:35 am

#298676

Hi
I'm archiving some data. In a 2 step process.
1. Copy old data from each table in LiveDB to same table in ArchiveDB.
2. Delete the data from each table in LiveDB which is in ArchiveDB
Both DBs SIMPLE recovery mode.
Each table has a clustered PK on a single int value. In both DBs
The tables with varchar(max) columns are taking a v.long time to copy over.
IS there anything I can change in the ArchiveDB to make it run faster.
It is the insert that is taking the time. I've tried dropping the clustered PKs in ArchiveDB tables and then rebuilding afterwards but it has not made any difference. After all I am adding data to the ArchiveDB in clustered index order, so wouldn't have expected it to.
Any tips, I can change the Archive DB but cannot touch the schema/settings of Live DB.
Thanks

Viewing 11 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286609 More actions · Answer 1

terry999 (3/13/2015)
Hi
I'm archiving some data. In a 2 step process.
1. Copy old data from each table in LiveDB to same table in ArchiveDB.
2. Delete the data from each table in LiveDB which is in ArchiveDB
Both DBs SIMPLE recovery mode.
Each table has a clustered PK on a single int value. In both DBs
The tables with varchar(max) columns are taking a v.long time to copy over.
IS there anything I can change in the ArchiveDB to make it run faster.
It is the insert that is taking the time. I've tried dropping the clustered PKs in ArchiveDB tables and then rebuilding afterwards but it has not made any difference. After all I am adding data to the ArchiveDB in clustered index order, so wouldn't have expected it to.
Any tips, I can change the Archive DB but cannot touch the schema/settings of Live DB.
Thanks

How much data are we talking about here? Do the varchar(max) columns have a lot of data too? Since this is just data being archived why such a concern about speed? You could do this in batches which would ease the pressure on the transaction log which will likely increase performance.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

terry999 SSCertifiable Points: 5036 More actions · Answer 2

Hi Sean thanks for replying.

It's an express edition that is getting close to it's individual DB size limit. The varchar(max) field mostly contain data and char len > 15,000 is common.

I have split the large tables into batches.

I was hoping there might be some setting for the ArchiveDB, which would improve insert performance but degrade select performance. The archive is just for reference.

terry999 SSCertifiable Points: 5036 More actions · Answer 3

Sorry I forgot, total data movement is about 5GB 90% of this 5GB are 2 tables with the varchar(max) columns.

It takes about >95% of the time to move this data, haven't deleted it yet! again will delete in batches.

My machine is slow, but still takes an hour to move the Data.

Sean Lange SSC Guru Points: 286609 More actions · Answer 4

terry999 (3/13/2015)
Sorry I forgot, total data movement is about 5GB 90% of this 5GB are 2 tables with the varchar(max) columns.
It takes about >95% of the time to move this data, haven't deleted it yet! again will delete in batches.
My machine is slow, but still takes an hour to move the Data.

Move it in batches. That is 5gb+ in your transaction log if you are doing it all at once.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Steve Jones - SSC Editor SSC Guru Points: 741407 More actions · Answer 5

What Sean said, but also, if you need to move large amount, pregrow your log file so it doens't need to happen during the copy.

terry999 SSCertifiable Points: 5036 More actions · Answer 6

Thanks for replying.

I size the Archive DB to 5GB to start with and give it a large tran log.

I notice even though I'm doing it in batches;

1. The tran log of the DB having data inserted is ok, so doesn't need to grow

2. The tran log of the DB that is being deleted from grows by the 5GB (approx), even though I split the deletes of the big tables (>90% data) into batches. I can see it is doing the deletes as batches (print statements)

Should I be issuing manual CHECKPOINTS?

Thanks Terry

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

What are the initial size and growth settings for the DB and the related log file?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

terry999 SSCertifiable Points: 5036 More actions · Answer 8

Jeff

I set the archive DB to be 5GB to start off with, it doesn't grow, it's tran log to 1 GB it also does not need to grow.

The Live DB is problematic (log 10% auto grow) starting at 10MB.

I plan to increase this before archiving I thought 1GB would be fine for each table/batch After all the Archive DB tran Log copes with this. However the Live DB (the one with Deletes happening) goes to > 5GB!!

I must have something wrong with my batch processing on deletes I will look at it tomorrow.

Thanks for your posts. I'd concentrated on the ArchiveDB log etc which I created, forgot about the LiveDB.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 9

Thanks, Terry. Sounds like you've done it right on the archive side.

On the DELETEs, remember that they affect ALL indexes and ALL FKs pointing at the table. If there are any INDEXED VIEWS, those will be affected, as well. We have one super-wide (137 column) table at work (not my doing) that takes nearly 60 seconds to delete a single row from because of all that, not to mention that it has more than 1 blob datatype in it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

terry999 SSCertifiable Points: 5036 More actions · Answer 10

Jeff,

My fault I was doing

use ArchiveDB

GO

DELETE FROM LiveDB.dbo.tblCust

WHERE EXISTS (

SELECT TOP (100000) tArchive.CustID FROM dbo.a_tblCust as tArchive

WHERE LiveDB.dbo.tblCust.custID = tArchive.custID

)

This worked correctly

DELETE TOP (100000) LiveDB.dbo.tblCust

FROM a_tblCust as tArchive

WHERE LiveDB.dbo.tblCust.custid= tArchive.custid

I'm joining archive to live DBs, cross DB join on 2 clustered indices i.e. PKs on CustID as integer.

I'm doing this N times where N is the no of batches.

If I were to select just CustIDs into a #Temp table, is it quicker joining to TempDB rather than To LiveDB. I wouldn't think so, it is still a cross DB join. Or is there something special about TempDB. Just curious

Thanks Terry