Using EXCEPT to Determine Row Differences

  • Comments posted to this topic are about the item Using EXCEPT to Determine Row Differences

  • Something to note about EXCEPT is that it is a DISTINCT Query and does not compare duplicate rows. Any suggestions on how you would get around this?

  • Where I have been using this is to keep read-only data synchronized between two areas by their primary key, so I do not currently have a scenario like this.

    If I was keeping a full history of everything synchronized to date that this might be similar to what you describe.

    Could you explain further?

  • hello,

    The join between #source and #Target in the SELECT .... EXCEPT is not essential

    You can obtain the same result with a best cost

    SELECT .[ID]

    , .[Item]

    , .[Price]

    , .[OrderDate]

    , .[Units]

    , .[ShipmentDate]

    FROM [Source]

    EXCEPT

    SELECT [T].[ID]

    , [T].[Item]

    , [T].[Price]

    , [T].[OrderDate]

    , [T].[Units]

    , [T].[ShipmentDate]

    FROM [Target] [T]

    ;

    Am I wrong?

    Arnaud

  • Lets say we have a reconciliation against a 3rd party vendor where we submit sales to them.

    There is a chance we can submit the same sale twice by accident. We would then only have 1 sale in our system but the vendor would see 2 sales. How do we use EXCEPT to reconcile these kinds of differences?

  • I agree with Old Hand. I tried the comparison both with INNER JOINs and without. I got the same result in both instances.

  • I have created, inserted the values and run the query as posted in this article and it produces the same result as

    SELECT .[ID]

    , .[Item]

    , .[Price]

    , .[OrderDate]

    , .[Units]

    , .[ShipmentDate]

    FROM [#Source]

    EXCEPT

    SELECT [T].[ID]

    , [T].[Item]

    , [T].[Price]

    , [T].[OrderDate]

    , [T].[Units]

    , [T].[ShipmentDate]

    FROM [#Target] [T]

    Still therefore trying to understand the use of the join

  • the join is to restrict the comparison to those cases where you have the same key values present in both source and value - and are looking for differences (rather than present in one and not in another.)

    Agreed that with the test data provided, this is not very apparent. if you add a row to the source table with an ID which is not present in the Target table, then you'll see the effect of the join.

  • you can use UNION also to do something similar. In this case, it returns both rows when there's a difference.

    SELECT *

    FROM (SELECT *

    , count(1) OVER (PARTITION BY ID) AS versions

    FROM (SELECT *

    FROM #Source AS ss

    UNION

    SELECT *

    FROM #Target AS ss

    ) x

    ) y

    WHERE versions <> 1

  • Without the join, the except statement would also return rows in the source that do not exist in the target as well as those that exist (by id) in both sets but are different.

    What I needed was to determine differences between rows that exist between both sets, but have differences.

    Having a series of tables with 100, 200+ columns to compare is asking for human error, especially when nulls need to be taken into account.

  • thank you for your explanation.

    having rows in the source that do not exist in the target as you describe is precisely why the join exists and the test data does not reflect that.

  • In other words, the article shows how to use [font="Courier New"]EXCEPT[/font] to do something really cool. Compare records with matching primary keys to see if they are identical. But the example doesn't demonstrate the cool part. OK, I get it.

  • This is a much better approach than comparing column by column.

    ---------------
    Mel. 😎

  • nick.latocha (3/25/2014)


    Lets say we have a reconciliation against a 3rd party vendor where we submit sales to them.

    There is a chance we can submit the same sale twice by accident. We would then only have 1 sale in our system but the vendor would see 2 sales. How do we use EXCEPT to reconcile these kinds of differences?

    When I was doing something similar (but used column by column compares so not as nice as this method), and the tables I compared had a chance of duplication, I added a column that was a count of rows, and did a group by everything to achieve it. Then this extra 'count' column could also be compared.

    Could be interesting to hear of an easier method, but thats what I came up with!

  • Nice article Douglas. I though you were crazy until I tried your code compared to what I had in my head. Pretty cool! I am faced with nearly an identical challenge today. Think I'll give it a try.

    Chris Umbaugh
    Data Warehouse / Business Intelligence Consultant
    twitter @ToledoSQL

Viewing 15 posts - 1 through 15 (of 73 total)

You must be logged in to reply to this topic. Login to reply