Using EXCEPT to Determine Row Differences

  • In most cases where we have to find delta the source and target tables aren't in same database often on different databases. Will Except perform same way if the source query have a link server ? And how would we do this in SSIS ?

  • What about using TABLEDIFF.exe? BOL: http://msdn.microsoft.com/en-us/library/ms162843.aspx

    The necessary command and parameters can easily be created programmatically with a simple T-SQL script. TABLEDIFF can also generate the statements to bring the destination table into convergence with the source table with the -f parameter.

    For example:

    tablediff.exe -sourceserver . -sourcedatabase Credit -sourceschema dbo -sourcetable Source -destinationserver . -destinationdatabase Credit -destinationschema dbo -destinationtable Target -f C:\TEMP\Target.sql

    In this example:

    - the comparison occurs on the same SQL Server instance between two tables, Source and Target, in the Credit database. Tablediff can compare between the same database or different databases on the same instance or different instances as defined in the respective parameters

    - -f generates the fix file to bring the destination table into convergence with the source table

    Yes there are some limitations:

    - can't be used with non-SQL servers

    - sql_variant datatype is not supported

    - allowable mappings between source and destination datatypes, see BOL link above

    - the source table must have a primary key, -strict parameter requires the same in the destination table

    - the script file to bring the destination table into convergence will not include the following datatypes:

    varchar(max)

    nvarchar(max)

    varbinary(max)

    timestamp

    xml

    text

    ntext

    image

    I used this just last week to fix an accidental change of production data. Luckily the change was on the primary server in a log shipping setup. The t-log backups are only applied on the logged shipped server early in the morning daily. Using TABLEDIFF allowed me to bring the database on the primary server into convergence with the log-shipped standby database.

  • We had this situation comparing two copies of a history table. We had to compare the natural key less than history_id where the natural key was not unique. My solution was to generate a sequence number using the row aggregate function with the sequence order determined by either the history_id or a timestamp. The comparison then included all columns except the history_id and including the sequence number. EXCEPT is great.

  • I found that Varchar(Max) and blob type values had to be specially handled. Our solution was to only compare the first 8000 characters. Also timestamp columns need different handling because the two copies of the table used a different SQL-Server ODBC driver so rounded differently.

  • Is EXCEPT better, or worse, than using:

    Select t1.col1, t1.col2, t1.col3

    from table1 t1

    where t1.col1 not in (select t2.col1 from table2 t2

    where t2.col1 = t1.col1

    and t2.col2 = t1.col2

    and t2.col3 = t1.col3

    )

    Wallace Houston
    Sunnyland Farms, Inc.

    "We must endeavor to persevere."

  • EXCEPT is much faster and finds differences in the attributes not just differences in which primary keys exist.

    One thing I forgot to note. To find the differences between two sets the EXCEPT needs to be performed both ways with results union-ed because either set may contain rows that the other does not.

    That is something like:

    Select *

    From

    (

    select 'InA-NotB' As

    , <all columns to compare>

    from

    (

    Select a.<all columns to compare>, <logic to add sequence number>

    From table1 As a

    Except

    Select b.<all columns to compare>, <logic to add sequence number>

    From table2 As b

    ) As x

    ) As [InA-NotB]

    Union All

    (

    select 'InB-NotA' As

    , <all columns to compare>

    from

    (

    Select b.<all columns to compare>, <logic to add sequence number>

    From table2 As b

    Except

    Select a.<all columns to compare>, <logic to add sequence number>

    From table1 As a

    ) As y

    ) As [InB-NotA]

    ) As [Diff]

    order by <desired order>

  • Hi and thanks for the article.

    I also use EXCEPT to investigate why two queries don't return the same amount of rows.

    with somerows (ListOfColumns)

    (

    /// first query

    Select .......

    )

    , AllRows (ListOfColumns)

    (

    //Second query that returns more results than the first one

    Select ......

    )

    SELECT ListOfColumns FROM AllRows EXCEPT SELECT ListOfColumns FROM SomeRows

    of course ListOfColumns must be the same list 🙂

    Thanks

    jose

  • EXCEPT is awesome for Row Differences between 2 sets, but I would caution that if you have a pk (like maybe in comparing objects between a publisher and subscriber in replication) and need to find rows existing in 1 set but not the other, EXCEPT will not be the best performance option, because it must generate both sets before describing the difference.

    Alternatively, NOT EXISTS will yield a much speedier result across the pk:

    --Rows in table A, not in table B

    SELECT

    A.*

    FROM [dbo].[SomeTable] A

    WHERE NOT EXISTS

    (

    SELECT

    TOP 1 1

    FROM [dbo].[SomeOtherTable] B

    WHERE B.[PK] = A.[PK]

    )

    Josh Lewis

  • Exactly what I needed to complete my project. Thank you very much!

  • enrique.pessoa - Friday, January 9, 2015 6:53 AM

    Regarding the query optimization, only the first join is needed, it ensures you are only taking the Source items that are also found on Target. The second one is not necessary.Actually, on this situation you could even avoid using joins and table aliases:SELECT ID , [Item] , [Price] , [OrderDate] , [Units] , [ShipmentDate] FROM [#Source] WHERE [ID] IN (SELECT [ID] FROM #Target)EXCEPTSELECT [ID] , [Item] , [Price] , [OrderDate] , [Units] , [ShipmentDate] FROM [#Target]

    I second this approach, although I think I'd use a correlated subquery. It just is more self-documenting as to what's intended. The join(s) alone, IMO, somewhat conceals the intent of only finding unmatched data which matches on primary key. Maybe it's just because Enrique's suggestion "reads" more like a sentence to me:

    SELECT .[ID]

         , .[Item]

         , .[Price]

         , .[OrderDate]

         , .[Units]

         , .[ShipmentDate]

      FROM [#Source]

     WHERE EXISTS (SELECT *

                     FROM [#Target] [T]

                    WHERE .[ID] = [T].[ID])

    EXCEPT

    SELECT [T].[ID]

         , [T].[Item]

         , [T].[Price]

         , [T].[OrderDate]

         , [T].[Units]

         , [T].[ShipmentDate]

    FROM [#Target] [T];

    The following would be even more self-documenting, but, it would require additional overhead in running an unnecessary EXISTS test in the second query:

    SELECT

    .[ID]

         , .[Item]

         , .[Price]

         , .[OrderDate]

         , .[Units]

         , .[ShipmentDate]

      FROM [#Source]

     WHERE EXISTS (SELECT *

                     FROM [#Target] [T]

                    WHERE .[ID] = [T].[ID])

    EXCEPT

    SELECT [T].[ID]

         , [T].[Item]

         , [T].[Price]

         , [T].[OrderDate]

         , [T].[Units]

         , [T].[ShipmentDate]

    FROM [#Target] [T];

    WHERE EXISTS (SELECT *

                    FROM [#Source]

                   WHERE [T].[ID] = .[ID]);

    --=Chuck

  • If you accept to have differences in one row you could do something like:
    SELECT T.*, S.*
    FROM #Targer AS T
    INNER JOIN #Source AS S
    WHERE EXISTS (
            SELECT T.*
            EXCEPT
            SELECT S.*
        );

    or equivalently:
    SELECT T.*, S.*
    FROM #Targer AS T
    INNER JOIN #Source AS S
    WHERE NOT EXISTS (
            SELECT T.*
            INTERSECT
            SELECT S.*
        )
    /*I used asterisk to shorten the code but you would likely use column lists instead.*/

  • Thank you for this informative article!
    I have tested this out on one of my own and would like know how to take the results and use them in an update or merge statement in sql server.
    I am trying to update records that have changed or are  new  in a table from a separate temp  table that i have generated from an xml file.
    So the tables are identical in structure but one has changed or new data.
    Any updates i have tried have changed all of the records with a matching "Client_Rec_Id"
    Any help would be appreciated as i have done quite a bit of searching with no luck.

    cheers Pete

    select tmp.Client_Rec_Id, tmp.PackageName, tmp.SectionName, tmp.[Name], tmp.ServiceFlag, tmp.Amount, tmp.ProductCategory, tmp.ProductSubCategory, tmp.ProductName, tmp.ProductDescription, tmp.ProductSupplier, tmp.PrePaidPN, tmp.ContractLastModified FROM tmpContractLineAmounts tmp
    inner join ContractLineAmounts cla
    on tmp.Client_Rec_Id = cla.Client_Rec_Id
    except
    select cla.Client_Rec_Id, cla.PackageName, cla.SectionName, cla.[Name], cla.ServiceFlag, cla.Amount, cla.ProductCategory, cla.ProductSubCategory, cla.ProductName, cla.ProductDescription, cla.ProductSupplier, cla.PrePaidPN, cla.ContractLastModified FROM ContractLineAmounts cla
    inner join tmpContractLineAmounts tmp
    on cla.Client_Rec_Id = tmp.Client_Rec_Id

  • I have a similar process where I have to determine if fields have changed for a record and then do something with the new data.  I use a merge statement and capture the output in a delta table that stores the old and new values and the action (INSERT , UPDATE, DELETE) that was performed during the merge.  My merge looks like this:


    MERGE

    DestinationTable AS tgt

    USING SourceTableOrQuery AS src

         ON tgt.KeyField1 = src.KeyField1

         AND tgt.KeyField2 = src.KeyField2

    WHEN MATCHED AND

         (ISNULL(tgt.StringField1, '') <> ISNULL(src.StringField1, '')

        ORISNULL(tgt.StringField2,'') <> ISNULL(src.StringField2, '')

        ORISNULL(tgt.NumericField1,0) <> ISNULL(src.NumericField1,0)    

        OR ISNULL(tgt.NumericField2,999) <> ISNULL(src.NumericField2,999)

        OR ISNULL(tgt.DateField1,'1900-01-01') <> ISNULL(src.DateField1,'1900-01-01')

    ) THEN

        UPDATE SET

            tgt.StringField1 = src.StringField1,

            tgt.StringField2 = src.StringField2,

            tgt.NumericField1 = src.NumericField1,

            tgt.NumericField2 = src.NumericField2,

            tgt.DateField1 = src.DateField1    
    WHEN NOT MATCHED BY SOURCE THEN

        DELETE     

    WHEN NOT MATCHED THEN 

        INSERT(KeyField1

            ,KeyField2

            ,StringField1

            ,StringField2

            ,NumericField1

            ,NumericField2

            ,DateField1)   

        VALUES

            (src.KeyField1

            ,src.KeyField2

            ,src.StringField1

            ,src.StringField2

            ,src.NumericField1

            ,src.NumericField2

            ,src.DateField1)

    OUTPUTgetdate(),

         $action,      

         deleted.KeyField1, 

         inserted.KeyField1,

         deleted.KeyField2,

         inserted.KeyField2,

         deleted.StringField1, --orig value

         inserted.StringField1, -- new value

         deleted.StringField2,

         inserted.StringField2,

         deleted.NumericField1,

         inserted.NumericField1,

         deleted.NumericField2,

         inserted.NumericField2,

         deleted.DateField1,    

         inserted.DateField1   

    INTO DELTA_Table;

Viewing 13 posts - 61 through 72 (of 72 total)

You must be logged in to reply to this topic. Login to reply