Incremental Data Load

  • Hi

    I currently have an SSIS job which is populating a reporting database from our live data. I have created a SQL command to select the data I need to then populate the reporting table. Currently we truncate the reporting table every night and re-populate it with all the data again. Because the table is so large and getting continuously bigger this job is gradually taking longer and longer to run.

    Ideally we would like to insert new records into the table without truncating the whole thing. I've tried to use a Lookup Transformation to populate only newly created data but this seems to take even longer! I don't know if this is because I am doing it wrong or whether it is the wrong tool for the job.

    Another way I have thought of is to tweak the query so that it only finds records that don't exist in the report table and then adds them:

    (WHERE TableID NOT IN (SELECT TableID FROM ReportTable)

    This is much faster but I'm not sure if it's the best way to go about it. I also envisage problems if we have to update records instead of just inserting them.

    It would be great if someone could tell me if I'm heading down the right way in terms of using SQL commands to insert/update my data and, if not, which route I should take?

    Thanks in advance

    Ross

  • If you're using SQL 2008 or newer, it sounds like a good place for the MERGE statement.

    Checkout: http://www.mssqltips.com/sqlservertip/1704/using-merge-in-sql-server-to-insert-update-and-delete-at-the-same-time/

  • How did you configure the Lookup transformation? (which version of SSIS are you using?)

    NOT IN is actually not a good option, as it is not really scalable. If ReportTable has one million rows, SQL Server needs to load all million TableID's to find out if one exists or not. A better alternative is using NOT EXISTS. Or the MERGE statement, as already suggested above.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi

    Does the source table that you populate your data from have a identity column as the primary key ?

    If so you can take the max of that in the report table and insert those records which are greater than that.

    Since I do not know your table structure this is just a possibility that i am thinking about

    "Keep Trying"

  • ChiragNS (12/16/2011)


    Hi

    Does the source table that you populate your data from have a identity column as the primary key ?

    If so you can take the max of that in the report table and insert those records which are greater than that.

    Since I do not know your table structure this is just a possibility that i am thinking about

    That would mean that the identities should be synchronized between source and destination. Risky business...

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • It's SSIS 2008 and I've set the Lookup for full cache, redirect rows to no match output. It does work but takes a long to time to cache as it is over 20 million records.

    MERGE looks exactly like what I need. This table is only inserts but I can go on to use it for any future ones which deal with updates as well.

    The source column does have a primary key so I should be able to try the MAX suggestion.

    Thanks for all the advice. I'll try them out and let you know how i get on.

    Ross

  • mildofly45 (12/16/2011)


    It's SSIS 2008 and I've set the Lookup for full cache, redirect rows to no match output. It does work but takes a long to time to cache as it is over 20 million records.

    MERGE looks exactly like what I need. This table is only inserts but I can go on to use it for any future ones which deal with updates as well.

    The source column does have a primary key so I should be able to try the MAX suggestion.

    Thanks for all the advice. I'll try them out and let you know how i get on.

    Ross

    Ah, 20 million rows, that explains a lot. Did you use a SQL query in the Lookup, or did you select the table from the dropdown list?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Also do you need to compare the entire source dataset each time.

    Is there anything that helps you to identify which records are new or have been updated since the last load? e.g. last updated date

    If are not able to identify new and updated records another option (if you are using Enterprize Edition) to consider using is SQL Server 2008 CDC.

  • Koen Verbeeck (12/16/2011)


    ChiragNS (12/16/2011)


    Hi

    Does the source table that you populate your data from have a identity column as the primary key ?

    If so you can take the max of that in the report table and insert those records which are greater than that.

    Since I do not know your table structure this is just a possibility that i am thinking about

    That would mean that the identities should be synchronized between source and destination. Risky business...

    Not necessarily,

    I'll store the identity column as an attribute of a dimension or fact table and have a small Execute SQL task get the max ID from that attribute and feed it into a SSIS variable, then pass it back to the SELECT statement that loads data from the source system. That works well for incremental loads... in the odd case I skipped one, I also have a Nightly task that reads the source table, does a lookup against the destination DW table and passes all the records that don't exist into dimensional/measures processing to "catch up".

    While I used to use the MERGE statement to do incremental loading, when you get to several million records, I've actually found that combinations of Lookups tends to perform better than the merge does.



    --Mark Tassin
    MCITP - SQL Server DBA
    Proud member of the Anti-RBAR alliance.
    For help with Performance click this link[/url]
    For tips on how to post your problems[/url]

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply