Speed of SSIS execution much slower in package execution as step( "Execute Package Tasks") compare to single package execution

  • Hi,
     I am creating  data migration scenario for SQL 2017
     I need to load  160 GB from ~250 tables  Server A database 1 to Server A database 2
     I build  ~ 6 SSIS , each SSIS load one or more schema (number of tables and each SSIS in as low  as 17  and as much as 60)
     Using SSDT 2017, all SSIS executed in Debug move from SSDT on Server A  
     Originally every SSIS build and Saved using "Import Data" wizard from from SSMS 2017
     Each SSIS contain sequential control flows Preparation SQL Tasks (Truncate tables) and Data Flow Tasks (each Data flow  has 5 source->destination transformations)
    On each Data flow Task I set  AutoAdjust BufferSize =True and  and for every table with Row count > 100000 I set Maximum insert commit size = 100000

    Test Scenario 1
     SSIS1  execution time 2 min
     SSIS2  execution time 3 min
     SSIS3  execution time 20 min (Some Data flow sources  have tables with 25M row, total number of row inserted by this SSIS  ~ =120M)

    Test Scenario 2
     Execution of  each SSIS as  "Execute Package Task")

     Execute Package Task 1  (SSIS1 )-- execution time 2 minutes
     Execute Package Task 1  (SSIS2 )-- execution time 3 minutes
     Execute Package Task 1  (SSIS3 )-- execution time  13 Hours!!!
    I can see in in debugger of SSDT that data transformation tasks in SSIS3  becoming extremely  slow  when executed in step as Execute Package Task

    Server has 24 GB of RAM and 4 CPU, database 1 and database 2 located on 1TB SSD drive
    Since it should be "one time" migration operation I was plan automate it  by running all SSIS  as set of   "Execute Package Tasks" 

    What might cause slow execution of in Data transformation when SSIS executed as   part of "Execute Package Tasks"  ?

  • ebooklub - Thursday, October 25, 2018 10:24 AM

    Hi,
     I am creating  data migration scenario for SQL 2017
     I need to load  160 GB from ~250 tables  Server A database 1 to Server A database 2
     I build  ~ 6 SSIS , each SSIS load one or more schema (number of tables and each SSIS in as low  as 17  and as much as 60)
     Using SSDT 2017, all SSIS executed in Debug move from SSDT on Server A  
     Originally every SSIS build and Saved using "Import Data" wizard from from SSMS 2017
     Each SSIS contain sequential control flows Preparation SQL Tasks (Truncate tables) and Data Flow Tasks (each Data flow  has 5 source->destination transformations)
    On each Data flow Task I set  AutoAdjust BufferSize =True and  and for every table with Row count > 100000 I set Maximum insert commit size = 100000

    Test Scenario 1
     SSIS1  execution time 2 min
     SSIS2  execution time 3 min
     SSIS3  execution time 20 min (Some Data flow sources  have tables with 25M row, total number of row inserted by this SSIS  ~ =120M)

    Test Scenario 2
     Execution of  each SSIS as  "Execute Package Task")

     Execute Package Task 1  (SSIS1 )-- execution time 2 minutes
     Execute Package Task 1  (SSIS2 )-- execution time 3 minutes
     Execute Package Task 1  (SSIS3 )-- execution time  13 Hours!!!
    I can see in in debugger of SSDT that data transformation tasks in SSIS3  becoming extremely  slow  when executed in step as Execute Package Task

    Server has 24 GB of RAM and 4 CPU, database 1 and database 2 located on 1TB SSD drive
    Since it should be "one time" migration operation I was plan automate it  by running all SSIS  as set of   "Execute Package Tasks" 

    What might cause slow execution of in Data transformation when SSIS executed as   part of "Execute Package Tasks"  ?

    Frankly, I fail to see why that task ever gets used for anything but totally trivial package executions.   Much better to use an Agent Job and job steps to control package execution flow.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • it is non domain local server I don't want to go over security setting for importing ssis  and saving passwords to run as sql server agent jobs

    goal is to load data over weekend time frame and monitor at any point of time  loading process using "user friendly GUI"  of SSDT

  • ebooklub - Thursday, October 25, 2018 1:26 PM

    it is non domain local server I don't want to go over security setting for importing ssis  and saving passwords to run as sql server agent jobs

    goal is to load data over weekend time frame and monitor at any point of time  loading process using "user friendly GUI"  of SSDT

    If security is a concern, then why have a non-domain server involved in moving data either from or to a domain server?   You can always use a database to store passwords because SSIS lets you do that.  But if all you are ever going to do is manually run the package within Visual Studio, and locally from this non-domain server, that's in the category of 100% dependent on a human being to accomplish.   Bit of a dangerous dependency if you are the only one who would know how to run it.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Full scenario  🙂
    production 
    Server X (hosting db A  400 GB ) and Server Y (hosting db B 400 GB ) located 2000 miles from our office 
    Project :  I need add to  db A  250 tables (160 GB)  from db B move it  to Azure managed instance ,synchronize with prod with minimum downtime and make it new prod  🙂

    Currently I am copying compressed backup from Server X and Y to local non domain server Z (take 4-5 hrs) 
    Restoring it on Z ,adding new schema to db A and running import of 250 tables, once import completed I will compress backup and move it it to blob storage in Azure and restore it
    Late synchronization of data will be done  with Red Gate SQL data compare
    Initially I created all SSIS on my local PC and and run it from SSDT  one by one with DEFAULT  setting total processing time for all packages was about  12 hrs 
    To minimize processing time I installed SSDT on non domain server X and copied project there assuming it will it will run faster especially  if modify default setting by changing ssis buffer pool and batch par row and commit size...
    So far I found following : 
    1. SSIS calling other SSIS as 6 sequential steps run much slower then each SSIS executed individually.
    2. IF SSIS has 10 Data flow tasks (each data flow has 5 tables) it runs much slower then execution of each data flow individually.

    Size of temp db is 4 GB , size of log file (simple recovery) in destination db on Server Z is 100 GB 

    Questions
     Why total time of  running all individual steps (Data flow tasks)  is  SSIS  is much less then  running then sequentially?
     During SSIS execution I see that CPU usage 30% and 9 GB from 24 GB memory is in use

  • With that much extra capacity hanging around, I'd be redesigning the packages to have multiple data flow tasks, and considerably more than 5 tables in each one, and making sure each one has max buffer available and considerable number of threads as well ... maybe somewhere in the 30 - 40 range?

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I used The AutoAdjustBufferSize Property based on article (https://www.mssqltips.com/sqlservertip/4221/improving-data-flow-performance-with-ssis-autoadjustbuffersize-property/)
    my current "Engine Threads" set to 10 (for 5 tables), I will try to set  them to higher value 
    What I also noted 
     table with 200000 rows and 10 XML type columns never finish loading with default setting

  • If this is a one-time process then why worry about 'automating' it as a parent/child package setup?  Put everything in a single package...

    As for performance - have you insured that every OLEDB Destination has been configured with an appropriate batch and commit size, or did you leave that as default?  If left as a default then all rows must be loaded before they will be committed and that will cause large delays because of the amount of data to be committed (as well as bloating the transaction log).

    You also need to insure that the destination has the resources to support the multiple processes.  If you are running on a server that has 24 CPUs - and the destination has less resources then the destination is going to limit the processing even when you have multiple threads.  Make sure you do not over commit against the destination as that could cause extreme performance issues on the destination.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Intermediate test result  
    250 tables (160 GB)  ~ 800,000,000 rows 
    Setting AutoAdjustBufferSize = True for all Data Flow tasks
    Batch size custom ,based on size of table and data type 
    Commit size left to default

    Example table with 400000 row and 15 columns of xml type 
    1.Batch size 10000 rows  loading time  4 hrs
    2.Batch size  500 rows loading time 3 minutes

    Created main SSIS calling sequentially 9 other SSIS  loading time  4 hrs
    Calling all 9 SSIS in parallel  loading time  3 1/2 hrs

    any setting changes on tables containing 40M plus rows (batch size, commit size,buffer pool size) did not speed up particular Data Flow transformation

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply