Using EXCEPT to Determine Row Differences

Question

Using EXCEPT to Determine Row Differences

Viewing 13 posts - 61 through 73 (of 73 total)

You must be logged in to reply to this topic. Login to reply

Moiz Khan SSC Journeyman Points: 85 More actions · Answer 1

In most cases where we have to find delta the source and target tables aren't in same database often on different databases. Will Except perform same way if the source query have a link server ? And how would we do this in SSIS ?

darkhelmutis SSCommitted Points: 1561 More actions · Answer 2

What about using TABLEDIFF.exe? BOL: http://msdn.microsoft.com/en-us/library/ms162843.aspx

The necessary command and parameters can easily be created programmatically with a simple T-SQL script. TABLEDIFF can also generate the statements to bring the destination table into convergence with the source table with the -f parameter.

For example:

tablediff.exe -sourceserver . -sourcedatabase Credit -sourceschema dbo -sourcetable Source -destinationserver . -destinationdatabase Credit -destinationschema dbo -destinationtable Target -f C:\TEMP\Target.sql

In this example:

- the comparison occurs on the same SQL Server instance between two tables, Source and Target, in the Credit database. Tablediff can compare between the same database or different databases on the same instance or different instances as defined in the respective parameters

- -f generates the fix file to bring the destination table into convergence with the source table

Yes there are some limitations:

- can't be used with non-SQL servers

- sql_variant datatype is not supported

- allowable mappings between source and destination datatypes, see BOL link above

- the source table must have a primary key, -strict parameter requires the same in the destination table

- the script file to bring the destination table into convergence will not include the following datatypes:

varchar(max)

nvarchar(max)

varbinary(max)

timestamp

xml

text

ntext

image

I used this just last week to fix an accidental change of production data. Luckily the change was on the primary server in a log shipping setup. The t-log backups are only applied on the logged shipped server early in the morning daily. Using TABLEDIFF allowed me to bring the database on the primary server into convergence with the log-shipped standby database.

jbwa SSC Eights! Points: 843 More actions · Answer 3

We had this situation comparing two copies of a history table. We had to compare the natural key less than history_id where the natural key was not unique. My solution was to generate a sequence number using the row aggregate function with the sequence order determined by either the history_id or a timestamp. The comparison then included all columns except the history_id and including the sequence number. EXCEPT is great.

jbwa SSC Eights! Points: 843 More actions · Answer 4

I found that Varchar(Max) and blob type values had to be specially handled. Our solution was to only compare the first 8000 characters. Also timestamp columns need different handling because the two copies of the table used a different SQL-Server ODBC driver so rounded differently.

Caruncles Hall of Fame Points: 3299 More actions · Answer 5

Is EXCEPT better, or worse, than using:

Select t1.col1, t1.col2, t1.col3

from table1 t1

where t1.col1 not in (select t2.col1 from table2 t2

where t2.col1 = t1.col1

and t2.col2 = t1.col2

and t2.col3 = t1.col3

)

Wallace Houston
Sunnyland Farms, Inc.

"We must endeavor to persevere."

jbwa SSC Eights! Points: 843 More actions · Answer 6

EXCEPT is much faster and finds differences in the attributes not just differences in which primary keys exist.

One thing I forgot to note. To find the differences between two sets the EXCEPT needs to be performed both ways with results union-ed because either set may contain rows that the other does not.

That is something like:

Select *

From

(

select 'InA-NotB' As , <all columns to compare>

from

(

Select a.<all columns to compare>, <logic to add sequence number>

From table1 As a

Except

Select b.<all columns to compare>, <logic to add sequence number>

From table2 As b

) As x

) As [InA-NotB]

Union All

(

select 'InB-NotA' As , <all columns to compare>

from

(

Select b.<all columns to compare>, <logic to add sequence number>

From table2 As b

Except

Select a.<all columns to compare>, <logic to add sequence number>

From table1 As a

) As y

) As [InB-NotA]

) As [Diff]

order by <desired order>

jose-276033 SSC Veteran Points: 219 More actions · Answer 7

Hi and thanks for the article.

I also use EXCEPT to investigate why two queries don't return the same amount of rows.

with somerows (ListOfColumns)

(

/// first query

Select .......

)

, AllRows (ListOfColumns)

(

//Second query that returns more results than the first one

Select ......

)

SELECT ListOfColumns FROM AllRows EXCEPT SELECT ListOfColumns FROM SomeRows

of course ListOfColumns must be the same list

Thanks

jose

DeusExDatum Ten Centuries Points: 1144 More actions · Answer 8

EXCEPT is awesome for Row Differences between 2 sets, but I would caution that if you have a pk (like maybe in comparing objects between a publisher and subscriber in replication) and need to find rows existing in 1 set but not the other, EXCEPT will not be the best performance option, because it must generate both sets before describing the difference.

Alternatively, NOT EXISTS will yield a much speedier result across the pk:

--Rows in table A, not in table B

SELECT

A.*

FROM [dbo].[SomeTable] A

WHERE NOT EXISTS

(

SELECT

TOP 1 1

FROM [dbo].[SomeOtherTable] B

WHERE B.[PK] = A.[PK]

)

Josh Lewis

adanb007 Grasshopper Points: 23 More actions · Answer 9

Exactly what I needed to complete my project. Thank you very much!

chuck.forbes SSCarpal Tunnel Points: 4294 More actions · Answer 10

enrique.pessoa - Friday, January 9, 2015 6:53 AM
Regarding the query optimization, only the first join is needed, it ensures you are only taking the Source items that are also found on Target. The second one is not necessary.Actually, on this situation you could even avoid using joins and table aliases:SELECT ID , [Item] , [Price] , [OrderDate] , [Units] , [ShipmentDate] FROM [#Source] WHERE [ID] IN (SELECT [ID] FROM #Target)EXCEPTSELECT [ID] , [Item] , [Price] , [OrderDate] , [Units] , [ShipmentDate] FROM [#Target]

I second this approach, although I think I'd use a correlated subquery. It just is more self-documenting as to what's intended. The join(s) alone, IMO, somewhat conceals the intent of only finding unmatched data which matches on primary key. Maybe it's just because Enrique's suggestion "reads" more like a sentence to me:

SELECT .[ID]

, .[Item]

, .[Price]

, .[OrderDate]

, .[Units]

, .[ShipmentDate]

FROM [#Source]

WHERE EXISTS (SELECT *

FROM [#Target] [T]

WHERE .[ID] = [T].[ID])

EXCEPT

SELECT [T].[ID]

, [T].[Item]

, [T].[Price]

, [T].[OrderDate]

, [T].[Units]

, [T].[ShipmentDate]

FROM [#Target] [T];

The following would be even more self-documenting, but, it would require additional overhead in running an unnecessary EXISTS test in the second query:

SELECT

.[ID]

, .[Item]

, .[Price]

, .[OrderDate]

, .[Units]

, .[ShipmentDate]

FROM [#Source]

WHERE EXISTS (SELECT *

FROM [#Target] [T]

WHERE .[ID] = [T].[ID])

EXCEPT

SELECT [T].[ID]

, [T].[Item]

, [T].[Price]

, [T].[OrderDate]

, [T].[Units]

, [T].[ShipmentDate]

FROM [#Target] [T];

WHERE EXISTS (SELECT *

FROM [#Source]

WHERE [T].[ID] = .[ID]);

--=Chuck

radek.celuch Default port Points: 1430 More actions · Answer 11

If you accept to have differences in one row you could do something like:
SELECT T.*, S.* FROM #Targer AS T INNER JOIN #Source AS S WHERE EXISTS ( SELECT T.* EXCEPT SELECT S.* );
or equivalently:
SELECT T.*, S.* FROM #Targer AS T INNER JOIN #Source AS S WHERE NOT EXISTS ( SELECT T.* INTERSECT SELECT S.* ) /*I used asterisk to shorten the code but you would likely use column lists instead.*/

peter.devine 25998 Grasshopper Points: 22 More actions · Answer 12

Thank you for this informative article!
I have tested this out on one of my own and would like know how to take the results and use them in an update or merge statement in sql server.
I am trying to update records that have changed or are new in a table from a separate temp table that i have generated from an xml file.
So the tables are identical in structure but one has changed or new data.
Any updates i have tried have changed all of the records with a matching "Client_Rec_Id"
Any help would be appreciated as i have done quite a bit of searching with no luck.

cheers Pete

select tmp.Client_Rec_Id, tmp.PackageName, tmp.SectionName, tmp.[Name], tmp.ServiceFlag, tmp.Amount, tmp.ProductCategory, tmp.ProductSubCategory, tmp.ProductName, tmp.ProductDescription, tmp.ProductSupplier, tmp.PrePaidPN, tmp.ContractLastModified FROM tmpContractLineAmounts tmp
inner join ContractLineAmounts cla
on tmp.Client_Rec_Id = cla.Client_Rec_Id
except
select cla.Client_Rec_Id, cla.PackageName, cla.SectionName, cla.[Name], cla.ServiceFlag, cla.Amount, cla.ProductCategory, cla.ProductSubCategory, cla.ProductName, cla.ProductDescription, cla.ProductSupplier, cla.PrePaidPN, cla.ContractLastModified FROM ContractLineAmounts cla
inner join tmpContractLineAmounts tmp
on cla.Client_Rec_Id = tmp.Client_Rec_Id

tspade SSC Veteran Points: 264 More actions · Answer 13

I have a similar process where I have to determine if fields have changed for a record and then do something with the new data. I use a merge statement and capture the output in a delta table that stores the old and new values and the action (INSERT , UPDATE, DELETE) that was performed during the merge. My merge looks like this:

MERGE

DestinationTable AS tgt

USING SourceTableOrQuery AS src

ON tgt.KeyField1 = src.KeyField1

AND tgt.KeyField2 = src.KeyField2

WHEN MATCHED AND

(ISNULL(tgt.StringField1, '') <> ISNULL(src.StringField1, '')

ORISNULL(tgt.StringField2,'') <> ISNULL(src.StringField2, '')

ORISNULL(tgt.NumericField1,0) <> ISNULL(src.NumericField1,0)

OR ISNULL(tgt.NumericField2,999) <> ISNULL(src.NumericField2,999)

OR ISNULL(tgt.DateField1,'1900-01-01') <> ISNULL(src.DateField1,'1900-01-01')

) THEN

UPDATE SET

tgt.StringField1 = src.StringField1,

tgt.StringField2 = src.StringField2,

tgt.NumericField1 = src.NumericField1,

tgt.NumericField2 = src.NumericField2,

tgt.DateField1 = src.DateField1
WHEN NOT MATCHED BY SOURCE THEN

DELETE

WHEN NOT MATCHED THEN

INSERT(KeyField1

,KeyField2

,StringField1

,StringField2

,NumericField1

,NumericField2

,DateField1)

VALUES

(src.KeyField1

,src.KeyField2

,src.StringField1

,src.StringField2

,src.NumericField1

,src.NumericField2

,src.DateField1)

OUTPUTgetdate(),

$action,

deleted.KeyField1,

inserted.KeyField1,

deleted.KeyField2,

inserted.KeyField2,

deleted.StringField1, --orig value

inserted.StringField1, -- new value

deleted.StringField2,

inserted.StringField2,

deleted.NumericField1,

inserted.NumericField1,

deleted.NumericField2,

inserted.NumericField2,

deleted.DateField1,

inserted.DateField1

INTO DELTA_Table;

Using EXCEPT to Determine Row Differences

Cookies on SQLServerCentral