How to compare rows in source and destination tables dynamically in SQL Server

  • We receive a data feed from our customers and we get roughly the same schema each time, though it can change on the customer end as they are using a 3rd party application. When we receive the data files we import the data into a staging database with a table for each data file (students, attendance, etc). We then want to compare that data to the data that we already have existing in the database for that customer and see what data has changed (either the column has changed or the whole row was possibly deleted) from the previous run. We then want to write the updated values or deleted rows to an audit table so we can then go back to see what data changed from the previous data import. We don't want to update the data itself, we only want to record what's different between the two datasets. We will then delete all the data from the customer database and import the data exactly as is from the new data files without changing it(this directive has been handed down and cannot change). The big problem is that I need to do this dynamically since I don't know exactly what schema I'm going to be getting from our customers since they can make customization to their tables. I need to be able to dynamically determine what tables there are in the destination, and their structure, and then look at the source and compare the values to see what has changed in the data.

    Additional info: There are no ID columns on source, though there are several columns that can be used as a surrogate key that would make up a distinct row.

    I'd like to be able to do this generically for each table without having to hard-code values in, though I might have to do that for the surrogate keys for each table in a separate reference table.

    I can use either SSIS, SPs, triggers, etc., whichever would make more sense. I've looked at all, including tablediff, and none seem to have everything I need or the logic starts to get extremely complex once I get into them.

    Of course any specific examples anyone has of something like this they have already done would be greatly appreciated.

    Let me know if there's any other information that would be helpful.

    Thanks

  • when you load the data into your staging table does the data then match the table in your database you will be loading the data into? if so i may look at a full outer join:

    select *

    from table1 a

    FULL OUTER JOIN stagingtable b

    on a.col1 = b.col1

    and a.col2 = b.col2

    and a.col3 = b.col3 -- continue this till all the uniques columns are done

    WHERE a.col1 is null

    OR a.col2 is null

    OR a.col3 is null

    OR b.col1 is null

    OR b.col2 is null

    OR b.col3 is null

    that will give you the differences (changes will be null in the outer join and depending on where the null is is where the data has changed) and any rows that were deleted (all nulls in table b)

    granted its a sledge hammer and might not be exactly what you are looking for but it will give you a place to start.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • Thanks for the reply.

    The table schema will "usually" match, though there's no guarantee that it will, which is why I was thinking that I need to build the column list dynamically from the destination, and map it to the source. Typically if the schema does change then it's because they've added a column, not because they've removed one. I'll pickup the schema change after the comparison because I drop and recreate the destination schema based on the new table definitions for the new source data.

    I'm thinking that maybe using EXCEPT after building the column list from the destination table for my select comparison between the two tables and then inserting that into a temp table, at least then I'm working with a smaller subset of the data and can use a cursor if need be on the smaller dataset instead of millions of rows as I'm currently doing.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply