Problem with executing SSIS Container, when not executed as standalone

  • Ahoi,

    Short introduction:
    I had a procedure that truncated and inserted a view into a table. The procedure is part of the last SSIS Package of the daily job that imports Data from SAP and updates the cubes.
    Because the system is not capable of running the job in a single step, i made a container that splits the workload in a bunch of small steps. (see container description below)
    --> result is the same as the old procedure, just done in little steps instead of one big
    --> there are previous steps in this SSIS package -> filling tables with all together around 200 million rows

    The problem:
    The container itself works better when:
    --> i manually run only this container in SSIS
    --> i deactivate all other tasks in this package and:
            --> use Managementstudio to run the job, starting with this SSIS Package
    --> when execute the loop task with the troublesome variable in Management studio as a static value, its even faster and takes only 13 mins
    Here its inconsistent:
            --> change the job to start with this SSIS Package and a scheduled start 45min - 2hours for this single step

    But when the container is run as part of the full job, it doesn't run properly.
    I am logging the parts and the time at which each step is done. A step that takes around 20-25mins when run fine, takes around 2 hours when run as part of the full job.
    Which is not acceptable because it is only one of multiple steps and if all executed after another the time is way too long.
    --> when run fine, the full task itself takes around 4-5 hours, as part of the day job this single step already takes around 2 hours
    Here is a logging example showing the issue:

    Description of the Container:
    - the container itself is working and the result have been validated, so this is just a description without code
    - i took the original view and added a where clause using a variable:
    --> the variable specifies a column (location) to be only a certain value, to reduce the total  amount of rows
    --> the number of loops is around 750, creating from 46 up to around 30 million rows
    --> i iterate over the amount of relevant column values, to know the amount of loops i need and execute for each column value once
    --> every time one iteration is done i insert a log into a table with a timestamp and the current value of the column/variable

    My tests:
    --> as earlier said, i tested running the job in SSMS manually (excluding previous steps in same package)
    --> when i run this step in SSMS directly with the variable that takes 2 hours, it is also alot faster
    --> i tested running the job with a new schedule starting from this step (excluding previous steps in same package)
    --> split the SSIS package and the job steps: made a 2nd ssis package and placed the container in there --> which is executed as in additional step in the job after doing the previous steps
    --> i monitored the server usage --> no other jobs or heavy SSAS analysis influencing + cpu aswell as ram are not overly high in either execution

    TL.DR.
    Container does not work properly when executed as part of a job  WITH REQUIRED PREVIOUS STEPS

  • Okay, so what, pray tell, can we possibly help with?   We have no information about the package itself beyond it's "variable performance", and zero level of detail about it or anything else.   Would 42 be a good answer for you?   Not sure why you think there's anything anyone could possibly do here, as we can't see your environment or know what your code or package or SQL is doing.   If I had to venture a wild guess, I'd say you have a problem design issue.   The conditions under which this package runs would, by your description, have to be considerably different in a way that the package is extremely sensitive to.   I'm doubtful that helps you very much, because it's kinda like saying to a doctor, "it hurts when I do this", and the doctor saying "then don't do that".

    You have more than enough posts to know that we need more than vague descriptions and a total lack of detail to  have much of a shot at helping in a complex scenario.   Only more details can help you here, and don't leave anything out...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • sgmunson - Friday, November 23, 2018 8:12 AM

    Okay, so what, pray tell, can we possibly help with?   We have no information about the package itself beyond it's "variable performance", and zero level of detail about it or anything else.   Would 42 be a good answer for you?   Not sure why you think there's anything anyone could possibly do here, as we can't see your environment or know what your code or package or SQL is doing.   If I had to venture a wild guess, I'd say you have a problem design issue.   The conditions under which this package runs would, by your description, have to be considerably different in a way that the package is extremely sensitive to.   I'm doubtful that helps you very much, because it's kinda like saying to a doctor, "it hurts when I do this", and the doctor saying "then don't do that".

    You have more than enough posts to know that we need more than vague descriptions and a total lack of detail to  have much of a shot at helping in a complex scenario.   Only more details can help you here, and don't leave anything out...

    Just wonder if it's the upcoming full moon that is affecting this😀
    😎
    On a slightly more serious note, Steve said exactly what's needed, now the ball is in your court.

  • Sorry, because i was not sure how code actually helps because the running itself actually works, but here it is:

    truncate  Bestand: explains itself --> truncates target table that i want to insert  data into
    get max Loop Lagerorte: amount of loops needed, according to the number of locations
    select count(distinct Lagerort_Key) as Bestandsbildsschritte from dst.fact_bestand

    get lagertort_key:
    combining loop number with variable key used in every loop
    select Lagerort_key from (
    select ROW_NUMBER() over(order by lagerort_key) as Prozessnr, a.Lagerort_key
    from (select distinct Lagerort_Key from dst.fact_bestand) a
    ) as b
    where Prozessnr = ?

    Build Bestandsbewertung: using the variable Lagerort_key --> i added a file because of the number of rows
    exec [DST].[Create_FACT_Bestandbewertung] ?;

    Erfolg/Abbruch: simple logging timestamp and Lagerort_key
    insert into dbo.Fact_Bestandsbewertung_Log
    values (?,'Erfolg',CURRENT_TIMESTAMP)

    As i previously mentioned:
    i used  debugging the check/confirm at which variable hes "stuck" --> takes 2 hours+
    its the  16, and when i run exec [DST].[Create_FACT_Bestandbewertung] ?
    exec [DST].[Create_FACT_Bestandbewertung] 16 --> in SSMS it takes around 13 Minutes
    For comparison, here is todays and the first run days results of running the package from SSMS with
    --> which would be, if consistantly around that, would be an  ACCEPTABLE Time
    Bildergebnis für sql server start job at step

  • here is the result of the next day, when the container is run as part of the full job, compared to when run yesterday with "Start Job at step.."

  • Now im actually confused, when i run at step its 1:30 hours aswell, compared to yesterdays 23mins

    is it some kind of caching issue?

    edit: while executing the procedure in SSMS still runs in 13 mins

  • sgmunson - Friday, November 23, 2018 8:12 AM

    Okay, so what, pray tell, can we possibly help with?   We have no information about the package itself beyond it's "variable performance", and zero level of detail about it or anything else.   Would 42 be a good answer for you?   Not sure why you think there's anything anyone could possibly do here, as we can't see your environment or know what your code or package or SQL is doing.   If I had to venture a wild guess, I'd say you have a problem design issue.   The conditions under which this package runs would, by your description, have to be considerably different in a way that the package is extremely sensitive to.   I'm doubtful that helps you very much, because it's kinda like saying to a doctor, "it hurts when I do this", and the doctor saying "then don't do that".

    You have more than enough posts to know that we need more than vague descriptions and a total lack of detail to  have much of a shot at helping in a complex scenario.   Only more details can help you here, and don't leave anything out...

    Did i miss essential Details?

  • Happy new year
    The last thing i could be thinking of is that because the number of rows created with this container varies from below 100 to up to 15 million, could that be creating a conflict with the internal statistics that are trying to improve the performance?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply