Best way to handle moving data in a varchar(max) field?

  • Any ideas how to efficiently move a lot of rows from a table in the OLTP to an archive on another server when that table  contains a varchar(max)?  My SSIS package just spins and spins on these, even after setting rows per batch and max insert commit size kinda small to keep batching small.

    The archive database is set to simple recovery model, would bulk load be better? 

    Is BCP worth looking at?

  • ManicStar - Tuesday, February 13, 2018 9:07 AM

    Any ideas how to efficiently move a lot of rows from a table in the OLTP to an archive on another server when that table  contains a varchar(max)?  My SSIS package just spins and spins on these, even after setting rows per batch and max insert commit size kinda small to keep batching small.

    The archive database is set to simple recovery model, would bulk load be better? 

    Is BCP worth looking at?

    What is the actual size of the data (datalength)?
    😎

  • Eirikur Eiriksson - Tuesday, February 13, 2018 9:25 AM

    ManicStar - Tuesday, February 13, 2018 9:07 AM

    Any ideas how to efficiently move a lot of rows from a table in the OLTP to an archive on another server when that table  contains a varchar(max)?  My SSIS package just spins and spins on these, even after setting rows per batch and max insert commit size kinda small to keep batching small.

    The archive database is set to simple recovery model, would bulk load be better? 

    Is BCP worth looking at?

    What is the actual size of the data (datalength)?
    😎

    Close to a meg for the largest one.  They are comments and call notes, so some of them are very long.

  • ManicStar - Tuesday, February 13, 2018 9:31 AM

    Eirikur Eiriksson - Tuesday, February 13, 2018 9:25 AM

    ManicStar - Tuesday, February 13, 2018 9:07 AM

    Any ideas how to efficiently move a lot of rows from a table in the OLTP to an archive on another server when that table  contains a varchar(max)?  My SSIS package just spins and spins on these, even after setting rows per batch and max insert commit size kinda small to keep batching small.

    The archive database is set to simple recovery model, would bulk load be better? 

    Is BCP worth looking at?

    What is the actual size of the data (datalength)?
    😎

    Close to a meg for the largest one.  They are comments and call notes, so some of them are very long.

    Not certain why the SSIS isn't working properly, those are not large chunks of data. What is the volume/time (number of entries) that you need to transfer?
    One option is to offload the content into a local staging table before the transfer if the source table is busy or cluttered with lock escalation etc.
    😎

  • ManicStar - Tuesday, February 13, 2018 9:31 AM

    Eirikur Eiriksson - Tuesday, February 13, 2018 9:25 AM

    ManicStar - Tuesday, February 13, 2018 9:07 AM

    Any ideas how to efficiently move a lot of rows from a table in the OLTP to an archive on another server when that table  contains a varchar(max)?  My SSIS package just spins and spins on these, even after setting rows per batch and max insert commit size kinda small to keep batching small.

    The archive database is set to simple recovery model, would bulk load be better? 

    Is BCP worth looking at?

    What is the actual size of the data (datalength)?
    😎

    Close to a meg for the largest one.  They are comments and call notes, so some of them are very long.

    You could try to insert the varchar(max) field as NULL first and later update them with actual data. this way you can finish the actions soon.

  • Don't use SSIS for that as it sucks with LOB types.

    Powershell or C#  (can be inside a SSIS package script task if you prefer) with a sql reader and a sqlbulkcopy will be the best options.

    But it can still be slow to copy - but not as slow as SSIS as it saves each blob on local disk as part of the process.

    small example - https://www.codeproject.com/Articles/18418/Transferring-Data-Using-SqlBulkCopy
    do remove the notifyafter and the sqlrowscopy bit as it only slows it down

  • Heh... so what's wrong with a simple INSERT/SELECT???

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Thursday, March 1, 2018 7:38 PM

    Heh... so what's wrong with a simple INSERT/SELECT???

    Didn't mention this one as this is across servers and linked servers can behave poorly with blobs

  • frederico_fonseca - Friday, March 2, 2018 2:29 AM

    Jeff Moden - Thursday, March 1, 2018 7:38 PM

    Heh... so what's wrong with a simple INSERT/SELECT???

    Didn't mention this one as this is across servers and linked servers can behave poorly with blobs

    Hmmmm... looking back, I don't know if it does or not because I've not had the need, yet, and so haven't actually tried.  I'll have to give that a try at work.  Thanks for the tip.

    Shifting gears a bit, there are even more ways to skin the same cat.  For example (and as the OP asked in the original post),  you could also use a "native" BCP export on one side and either a BULK INSERT or "native" BCP import on the other side.  It's already a "written" bit of functionality and it could be called from an SSIS task.

    Another possibility is to set things up in a temporally based partitioned view where each partition is actually in a small database.  If the small database files share common storage between the two servers (not uncommon for where I work at), it would be incredibly fast to rebuild the view at the source to exclude the partition to be moved and then detach the small database.  The prompt the server on the other end to attach the small database and rebuild its partitioned view to include it.  No data movement would actually occur which would make it comparatively lightning fast.

    If the servers didn't share common storage, then perhaps a compressed backup and restore of the small partition database would fit the bill.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply