Compare 2 rows using tsql code

Question

Compare 2 rows using tsql code

vsamantha35

SSChampion

Points: 11529
More actions
September 18, 2017 at 3:24 am

#349131

Hi All,
A questions which is there with me a long long time. Comparing 2 rows in a table.
How can I compare 2 rows in a table and pin-point and say so and so specific column value don't match.
In my below example, i want to compare 2 rows and should display the ouput as c2 column value is different and so both rows are not same.
Two compare 2 rows, I think I can use checksum,binary_checksum,hashbytes etc.... but the wont tell me which column has the difference.
Correct me if I am wrong !!
For row comparisons, what I am doing is , copy 2 rows into an excel sheet and using transponse converting row to a column and thats how comparing both the rows.
I would like to see if I can do that using TSQL itself.
create table test
(c1 int,
c2 char(10),
c3 varchar(100),
c4 float
)
insert into test
select 1,'A','Adam',9000
union all
select 1,'B','Adam',9000
select * from test

Thanks in advance.
Sam

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Thom A SSC Guru Points: 99126 More actions · Answer 1

This is a bit of a guess, based on your small dataset, but this, at least, might get the ball rolling:
SELECT *, CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS Compare FROM test;

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

TheCTEGuy SSC Eights! Points: 972 More actions · Answer 2

Thom A - Monday, September 18, 2017 3:42 AM
This is a bit of a guess, based on your small dataset, but this, at least, might get the ball rolling:
SELECT *, CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS Compare FROM test;

Below is the script, probably the extension of above -

drop table test create table test (c1 int, c2 char(10), c3 varchar(100), c4 float )

insert into test select 1,'A','Adam',9000 union all select 2,'A','Adam',9000 union all select 1,'B','Adam',9000 union all select 1,'A','Jhon',9000 union all select 2,'A','Adam',9001

SELECT *, CASE WHEN LAG(c1,1,C1) OVER (PARTITION BY C2, C3, C4 ORDER BY C1) = c1 THEN 'Same' ELSE 'Differs' END AS CompareC1, CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS CompareC2, CASE WHEN LAG(c4,1,C4) OVER (PARTITION BY C1, C2, C3 ORDER BY C4) = c4 THEN 'Same' ELSE 'Differs' END AS CompareC4, CASE WHEN LAG(c3,1,C3) OVER (PARTITION BY C1, C2, C4 ORDER BY C3) = c3 THEN 'Same' ELSE 'Differs' END AS CompareC3 FROM test order by c1,c2,c3,c4

First solve the problem then write the code !

aaron.reese SSChampion Points: 13498 More actions · Answer 3

If you want a generic solution then you are going to need to use a cursor, sys.colums and probably some executable SQL to build what is effectively the pivoted data (rowID, ColumnID, ColumnValue) and then compare them. Remember that some columns will change by default (timestamp/rowID) on every change so would probably not need to be included in your results set.

Can you provide more context of why you need to identify the exact column(s) that are different and what you are going to do with the data - this may yield an alternative solution.

Thom A SSC Guru Points: 99126 More actions · Answer 4

aaron.reese - Monday, October 2, 2017 5:13 AM
If you want a generic solution then you are going to need to use a cursor, sys.colums and probably some executable SQL to build what is effectively the pivoted data (rowID, ColumnID, ColumnValue) and then compare them. Remember that some columns will change by default (timestamp/rowID) on every change so would probably not need to be included in your results set.
Can you provide more context of why you need to identify the exact column(s) that are different and what you are going to do with the data - this may yield an alternative solution.

A cursor?! No.... No, no no no no... There are far better options than using a cursor to get dynamic column names. Plus, like I showed before, Window functions can easily be used to compare separate rows if they are in a defined order.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

aaron.reese SSChampion Points: 13498 More actions · Answer 5

so if you wanted to pass a table name and two record IDs to a function/procedure that would spit out a list of fields with differences and the before/after values, how would you do it without a cursor at some point?. you have to convert the individual records into an EAV table so that you can join them together.

I suppose you could use a combination of dynamic SQL to UNPIVOT, build a string of fields using STUFF...FOR XML and Execute @SQL dynamic statement but the code would be horrible. At least a cursor would be readable.

Sorry, Thread Hijack!

Thom A SSC Guru Points: 99126 More actions · Answer 6

aaron.reese - Monday, October 2, 2017 6:33 AM
so if you wanted to pass a table name and two record IDs to a function/procedure that would spit out a list of fields with differences and the before/after values, how would you do it without a cursor at some point?. you have to convert the individual records into an EAV table so that you can join them together.
I suppose you could use a combination of dynamic SQL to UNPIVOT, build a string of fields using STUFF...FOR XML and Execute @SQL dynamic statement but the code would be horrible. At least a cursor would be readable.
Sorry, Thread Hijack!

For the example that the OP gave, they are simply looking at the next record, there's no need to do a repetative task along a whole dataset, when you can easily use a window function to inspect the last line. LAG may not be the right choice for every event, it may be that FIRST_VALUE is the right choice, but that's data dependant.

As for Dynamic SQL (D-SQL), you can just as easily make it readable, but my main point is the importance of efficiency. A cursor may be considered by you to be more "readable" but that doesn't mean it performs well. A poorly written but "well worded" cursor is a far worse option that well written but "poorly worded" D-SQL. Plus, D-SQL is often very easy to trouble shoot my simply changing the EXEC statement to a PRINT. You can see what SQL would be ran then, adjust that statement as needed, and then amend the D-SQL creation part.

A "good" query is almost always going to put efficiency first before readability.

Thom~

Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk

aaron.reese SSChampion Points: 13498 More actions · Answer 7

Agree with all of your points but I did qualify my statement by saying that there were other considerations if you need a generic solution. (i.e. compare any two records in any table - assuming a single field PK)

vsamantha35 SSChampion Points: 11529 More actions · Answer 8

TheCTEGuy - Saturday, September 30, 2017 5:54 AM
Thom A - Monday, September 18, 2017 3:42 AM
This is a bit of a guess, based on your small dataset, but this, at least, might get the ball rolling:
SELECT *, CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS Compare FROM test;
Below is the script, probably the extension of above -
drop table test create table test (c1 int, c2 char(10), c3 varchar(100), c4 float )
insert into test select 1,'A','Adam',9000 union all select 2,'A','Adam',9000 union all select 1,'B','Adam',9000 union all select 1,'A','Jhon',9000 union all select 2,'A','Adam',9001
SELECT *, CASE WHEN LAG(c1,1,C1) OVER (PARTITION BY C2, C3, C4 ORDER BY C1) = c1 THEN 'Same' ELSE 'Differs' END AS CompareC1, CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS CompareC2, CASE WHEN LAG(c4,1,C4) OVER (PARTITION BY C1, C2, C3 ORDER BY C4) = c4 THEN 'Same' ELSE 'Differs' END AS CompareC4, CASE WHEN LAG(c3,1,C3) OVER (PARTITION BY C1, C2, C4 ORDER BY C3) = c3 THEN 'Same' ELSE 'Differs' END AS CompareC3 FROM test order by c1,c2,c3,c4

Hi TheCTEGuy,

Thanks for the solution. But it is not working properly for below data. How can I fix it???

drop table test
create table test
(c1 int,
c2 char(10),
c3 varchar(100),
c4 float
)

truncate table test

insert into test
select 1,'A','Adam',9000
union all
select 1,'B','Adam',7000
go

select * from test;
go

SELECT *,
CASE WHEN LAG(c1,1,C1) OVER (PARTITION BY C2, C3, C4 ORDER BY C1) = c1 THEN 'Same' ELSE 'Differs' END AS CompareC1,
CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS CompareC2,
CASE WHEN LAG(c3,1,C3) OVER (PARTITION BY C1, C2, C4 ORDER BY C3) = c3 THEN 'Same' ELSE 'Differs' END AS CompareC3,
CASE WHEN LAG(c4,1,C4) OVER (PARTITION BY C1, C2, C3 ORDER BY C4) = c4 THEN 'Same' ELSE 'Differs' END AS CompareC4
FROM test
order by c1,c2,c3,c4

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 9

What results were you expecting? Since no matter how you slice it, you have only one row in each partition, and you set the default value (the third parameter to the LAG function) to the current row value, then you're always going to get 'Same'.

John

vsamantha35 SSChampion Points: 11529 More actions · Answer 10

John Mitchell-245523 - Wednesday, October 25, 2017 5:20 AM
What results were you expecting? Since no matter how you slice it, you have only one row in each partition, and you set the default value (the third parameter to the LAG function) to the current row value, then you're always going to get 'Same'.
John

Hi John,
Thanks for the response.

Thing is, when I am comparing row-1 to row-2 , then the values for row 2 Compare values should be as follows.
CompareC1 CompareC2 CompareC3 CompareC4
Same Differes Same Differs

Can this made dynamic? and also at any point I am going to compare only 2 rows not more than that.

Thanks,
Sam

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 11

Sam

So you're taking the whole set and comparing the value of each column in each row to the values of each column in the previous row? The trouble is, your set doesn't have any intrinsic order - unless you have an ID or date column or something like that that you haven't shown us? Try this - I've used C4 since the rows as you've shown them happen to be in the same order as the values in that column.

SELECT *, CASE WHEN LAG(C1,1,C1) OVER (ORDER BY C4) = C1 THEN 'Same' ELSE 'Differs' END AS CompareC1, CASE WHEN LAG(C2,1,C2) OVER (ORDER BY C4) = C2 THEN 'Same' ELSE 'Differs' END AS CompareC2, CASE WHEN LAG(C3,1,C3) OVER (ORDER BY C4) = C3 THEN 'Same' ELSE 'Differs' END AS CompareC3, CASE WHEN LAG(C4,1,C4) OVER (ORDER BY C4) = C4 THEN 'Same' ELSE 'Differs' END AS CompareC4 FROM test

John

sgmunson SSC Guru Points: 110639 More actions · Answer 12

You're probably looking for the following:CREATE TABLE #test ( c1 int, c2 char(10), c3 varchar(100), c4 float ); INSERT INTO #test (c1, c2, c3, c4) VALUES (1,'A','Adam',9000), (1,'B','Adam',9000);

WITH RAW_DATA AS (

SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RN FROM #test ) SELECT RN, c1, c2, c3, c4, CASE RN WHEN 1 THEN CASE WHEN LEAD(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END END AS CompareC1, CASE RN WHEN 1 THEN CASE WHEN LEAD(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END END AS CompareC2, CASE RN WHEN 1 THEN CASE WHEN LEAD(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END END AS CompareC3, CASE RN WHEN 1 THEN CASE WHEN LEAD(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END END AS CompareC4 FROM RAW_DATA ORDER BY RN;

DROP TABLE #test;

The problem with what's been previous posted is that the use of the LAG function was invoking a 3rd parameter value that says if you can't get a previous value because such a record doesn't exist, just use the current record's value, but that is relying on a previous record, which won't exist for the first row. See if you understand how my code works. It's specifically designed for exactly 2 rows of data.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

vsamantha35 SSChampion Points: 11529 More actions · Answer 13

John Mitchell-245523 - Wednesday, October 25, 2017 7:09 AM
Sam
So you're taking the whole set and comparing the value of each column in each row to the values of each column in the previous row? The trouble is, your set doesn't have any intrinsic order - unless you have an ID or date column or something like that that you haven't shown us? Try this - I've used C4 since the rows as you've shown them happen to be in the same order as the values in that column.
SELECT *, CASE WHEN LAG(C1,1,C1) OVER (ORDER BY C4) = C1 THEN 'Same' ELSE 'Differs' END AS CompareC1, CASE WHEN LAG(C2,1,C2) OVER (ORDER BY C4) = C2 THEN 'Same' ELSE 'Differs' END AS CompareC2, CASE WHEN LAG(C3,1,C3) OVER (ORDER BY C4) = C3 THEN 'Same' ELSE 'Differs' END AS CompareC3, CASE WHEN LAG(C4,1,C4) OVER (ORDER BY C4) = C4 THEN 'Same' ELSE 'Differs' END AS CompareC4 FROM test
John

Thanks John.
Yes, I am taking the whole set and comparing the value of each column in each row to the values of each column in the previous row.

vsamantha35 SSChampion Points: 11529 More actions · Answer 14

sgmunson - Wednesday, October 25, 2017 10:40 AM
You're probably looking for the following:CREATE TABLE #test ( c1 int, c2 char(10), c3 varchar(100), c4 float ); INSERT INTO #test (c1, c2, c3, c4) VALUES (1,'A','Adam',9000), (1,'B','Adam',9000); WITH RAW_DATA AS ( SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RN FROM #test ) SELECT RN, c1, c2, c3, c4, CASE RN WHEN 1 THEN CASE WHEN LEAD(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END END AS CompareC1, CASE RN WHEN 1 THEN CASE WHEN LEAD(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END END AS CompareC2, CASE RN WHEN 1 THEN CASE WHEN LEAD(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END END AS CompareC3, CASE RN WHEN 1 THEN CASE WHEN LEAD(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END WHEN 2 THEN CASE WHEN LAG(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END END AS CompareC4 FROM RAW_DATA ORDER BY RN;
DROP TABLE #test;
The problem with what's been previous posted is that the use of the LAG function was invoking a 3rd parameter value that says if you can't get a previous value because such a record doesn't exist, just use the current record's value, but that is relying on a previous record, which won't exist for the first row. See if you understand how my code works. It's specifically designed for exactly 2 rows of data.

Thanks Steve. Thats exactly I was looking for. At this point, I want to compare any 2 rows in a table.

Many thanks.