• shannonjk (12/7/2012)


    I am aware of how to query data based on the entry dates and change dates. If it were a simple matter of throwing together some SQL then I wouldn't have needed to post this question ;-). Perhaps I should have stated my original question better though 🙂

    I am concerned with the HOW on this; as in what data flow items do I use to achieve monitoring the absent keys, newer entry dates, and altered change dates? Or if there is a good link that explains this methodology?

    Since you already know how to query your source data, I don't think it's an overly complex task to update your warehouse -- perhaps a little more than throwing together some T-ASQL. 😀 You mention that you have 75+ source tables, so it's not a little thing either. But once you get the idea for deletes, updates and additions you apply it to all of your other tables.

    Break it down into parts:

    For your deletes from the warehouse:

    You don't have to monitor your deleted business keys, just determine what ones are present in your DW each night that are no longer in your source data. These are the records to remove from your Fact tables in the DW. You can do this using a SSIS lookup transformation. If you don''t find the business key, route that down the path to a SQL Command to log and delete from your fact table.

    For deletes, remove records from your fact tables before your dimension tables because of the FK referential integrity on your tables.

    For additions:

    For the same FK reasons, do your additions to your dimension tables first, then your fact tables. You know what records need to be inserted into the DW because you have the source system entry dates since last night (or the last time the ETL finished). You can basically use your current ETL packages, just change the source tranformations to add the WHERE EntryDate > @LastETLRunDate

    For updates:

    Do your changes after your additions so that any new records in the dimension tables will be present if you have to update your fact table to point to those new dimension records.

    You know what records potentially need to be updated in the DW by the change date in your source system. You need to SELECT these records and compare against the fields in the DW. You can do this simply in a conditional split, any records that have changed get routed down the update path to a SQL command to update those records. Take a look at: http://bennyaustin.wordpress.com/2010/05/29/alternatives-to-ssis-scd-wizard-component/. This shows you a couple of options rather than the dismal SCD transformation in SSIS.


    Regarding Jeff's comments that this can be done in T-SQL; he's 100% spot on. SSIS is just another tool in your belt. Use whatever tool you (and your team) are most comfortable with. I do all of these types of jobs with SSIS. A T-SQL wizard like Jeff, can do the same tasks using T-SQL.


    If I've been talking at too high a level of generalities/abstractions, please ask further questions.

    Take a look at Andy Leonard's Stairway series here on SSC where he talkes about adding, deleting and updating records in an incremental process. http://www.sqlservercentral.com/stairway/72494/

    HTH,

    Rob