SSIS ETL Job Daily Load

  • Hello All,

    Let me explain you the entire scenario.

    I am loading the data to my DB from a warehouse by using SSIS ETL jobs.

    This process is taking bit long as I am extracting the data from 5 different tables.

    This Job is scheduled to run daily. In clean up I am truncating the fact data that means daily it is truncating and loading new data which taking long time.

    Please advise how to overcome this situation.

    Thanks..!!

  • Hi a couple of questions:

    Define a bit long?

    How large are the tables?

    Are you using complex queries to retrieve your datasets?

    How many transformations are you performing in your ETL's?

    Are you using fast table loads for your inserts?

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • To extract last 9months data it is taking around 1hour.

    And I am not using any complex queries only using few Left outer joins. The Main table has 4 million + records.

    Yes, I am using fast load to insert data.

  • Does seem a bit long...

    Are the destination tables heavily indexed?

    Are there no transformations in the ETL?

    Could you post the SQL?

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • This is the SQL: SELECT b.comp_code,

    a.cust_code,

    b.cust_name,

    d.country_code,

    e.country_desc,

    a.item_code,

    c.item_desc,

    a.order_no,

    a.reference_a_no,

    a.reference_b_no,

    a.po_no,

    a.position_no,

    a.delivery_no,

    a.selling_price,

    a.cost_price,

    a.order_qty,

    a.delivered_qty,

    a.invoice_date,

    a.invoice_no,

    a.trans_type,

    a.invoice_amnt

    FROM tableA a, table b, table d, table c, table e

    WHERE (a.cust_code = b.cust_code) AND (a.order_no = d.order_no) AND (a.item_code= c.item_code) AND (d.country_code= e.country_code)

    AND (invoice_Date BETWEEN '2011-07-01' AND TODAY - 1)

    AND (a.delivered_qty > 0) AND (comp_code = '1')

    Yes I am using transformation.

  • Pihu.awasthi (5/2/2012)


    In clean up I am truncating the fact data that means daily it is truncating and loading new data which taking long time.

    Why are you truncating the fact table?

    Why aren't you doing deltas?

    It seems that you are spending a lot of time loading data you already DID.

    CEWII

  • Can you please tell me how to do the deltas??

  • Pihu.awasthi (5/2/2012)


    Can you please tell me how to do the deltas??

    To all intents and purposes its an incremental load, only loading records that have changed or been loaded since your last refresh.

    this stairway explains the basics:

    http://www.sqlservercentral.com/stairway/72494/

    Again

    What transformations are you performing?

    Are the tables heavily indexed?

    Both of these can slow down loading time.

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • That link really helps.

    The tables are not heavily indexed. I am using UnionAll, RowCount, DerivedCoulmn transformations.

    I guess I have to use something like this query to insert new rows:

    Delete src

    From Person.Contact src

    Join StageDeletes stage

    On stage.Email = src.EmailAddress

    Thanks.!

  • Glad it helped 🙂

    Delete src

    From Person.Contact src

    Join StageDeletes stage

    On stage.Email = src.EmailAddress

    Just be careful doing something like this - using the email address to identify new records wasn't the best example. Much better to use a date last modified column, hash values or something along those lines..

    Andy

    ==========================================================================================================================
    A computer lets you make more mistakes faster than any invention in human history - with the possible exceptions of handguns and tequila. Mitch Ratcliffe

  • Ohh Yes..!!

    I am using the invoice_date field.

    DELETE FROM tableA

    FROM tableA a

    INNER JOIN (SELECT DISTINCT invoice_date FROM tableB) b

    ON (a.invoice_date = b.invoice_date)

    But this query deleting all the data then inserting new data..

    Is it right to use this query for deleting fact?

  • Hi Guys,

    I am new is ssis.... i want to data populate from source DB to destination DB (insert / update / delete) how can i achieve this...

    Is it need to use more than one execute SQL task component?

    Please help me to achieve this..

    Advance thanks...

    Pradeep

  • Have you determined what is actually taking the most time in the job? Is it taking forever to purge the old records, but loading the new rows is quick? The opposite? Can you determine if the query is taking a long time to return records (that is, can you run the query in SSMS and get results back quickly) or if it's taking a long time to write to the target? Are these databases on different servers? Network latency issues?

    pradeep: Please start your own thread and don't hijack someone else's.

  • This stairway is going to save my life. Thanks so much for posting!:-)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply