Compare 2 rows using tsql code

  • Hi All,

    A questions which is there with me a long long time. Comparing 2 rows in a table.
    How can I compare 2 rows in a table and pin-point and say so and so specific column value don't match.
    In my below example, i want to compare 2 rows and should display the ouput as c2 column value is different and so both rows are not same.
    Two compare 2 rows, I think I can use checksum,binary_checksum,hashbytes etc.... but the wont tell me which column has the difference.
    Correct me if I am wrong !!
    For row comparisons, what I am doing is , copy 2 rows into an excel sheet and using transponse converting row to a column and thats how comparing both the rows.
    I would like to see if I can do that using TSQL itself.

    create table test
    (c1 int,
    c2 char(10),
    c3 varchar(100),
    c4 float
    )

    insert into test
    select 1,'A','Adam',9000
    union all
    select 1,'B','Adam',9000

    select * from test


    Thanks in advance.
    Sam

  • This is a bit of a guess, based on your small dataset, but this, at least, might get the ball rolling:
    SELECT *,
           CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS Compare
    FROM test;

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A - Monday, September 18, 2017 3:42 AM

    This is a bit of a guess, based on your small dataset, but this, at least, might get the ball rolling:
    SELECT *,
           CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS Compare
    FROM test;

    Below is the script, probably the extension of above - 

    drop table test
    create table test
    (c1 int,
    c2 char(10),
    c3 varchar(100),
    c4 float
    )

    insert into test
    select 1,'A','Adam',9000
    union all
    select 2,'A','Adam',9000
    union all
    select 1,'B','Adam',9000
    union all
    select 1,'A','Jhon',9000
    union all
    select 2,'A','Adam',9001

    SELECT *,
         CASE WHEN LAG(c1,1,C1) OVER (PARTITION BY C2, C3, C4 ORDER BY C1) = c1 THEN 'Same' ELSE 'Differs' END AS CompareC1,
       CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS CompareC2,
         CASE WHEN LAG(c4,1,C4) OVER (PARTITION BY C1, C2, C3 ORDER BY C4) = c4 THEN 'Same' ELSE 'Differs' END AS CompareC4,
         CASE WHEN LAG(c3,1,C3) OVER (PARTITION BY C1, C2, C4 ORDER BY C3) = c3 THEN 'Same' ELSE 'Differs' END AS CompareC3
    FROM test
    order by c1,c2,c3,c4

    First solve the problem then write the code !

  • If you want a generic solution then you are going to need to use a cursor, sys.colums and probably some executable SQL to build what is effectively the pivoted data (rowID, ColumnID, ColumnValue) and then compare them. Remember that some columns will change by default (timestamp/rowID) on every change so would probably not need to be included in your results set.

    Can you provide more context of why you need to identify the exact column(s) that are different and what you are going to do with the data - this may yield an alternative solution.

  • aaron.reese - Monday, October 2, 2017 5:13 AM

    If you want a generic solution then you are going to need to use a cursor, sys.colums and probably some executable SQL to build what is effectively the pivoted data (rowID, ColumnID, ColumnValue) and then compare them. Remember that some columns will change by default (timestamp/rowID) on every change so would probably not need to be included in your results set.

    Can you provide more context of why you need to identify the exact column(s) that are different and what you are going to do with the data - this may yield an alternative solution.

    A cursor?! No.... No, no no no no... There are far better options than using a cursor to get dynamic column names. Plus, like I showed before, Window functions can easily be used to compare separate rows if they are in a defined order.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • so if you wanted to pass a table name and two record IDs to a function/procedure that would spit out a list of fields with differences and the before/after values, how would you do it without a cursor at some point?. you have to convert the individual records into an EAV table so that you can join them together.

    I suppose you could use a combination of dynamic SQL to UNPIVOT, build a string of fields using STUFF...FOR XML and Execute @SQL dynamic statement but the code would be horrible. At least a cursor would be readable.

    Sorry, Thread Hijack!

  • aaron.reese - Monday, October 2, 2017 6:33 AM

    so if you wanted to pass a table name and two record IDs to a function/procedure that would spit out a list of fields with differences and the before/after values, how would you do it without a cursor at some point?. you have to convert the individual records into an EAV table so that you can join them together.

    I suppose you could use a combination of dynamic SQL to UNPIVOT, build a string of fields using STUFF...FOR XML and Execute @SQL dynamic statement but the code would be horrible. At least a cursor would be readable.

    Sorry, Thread Hijack!

    For the example that the OP gave, they are simply looking at the next record, there's no need to do a repetative task along a whole dataset, when you can easily use a window function to inspect the last line. LAG may not be the right choice for every event, it may be that FIRST_VALUE is the right choice, but that's data dependant.

    As for Dynamic SQL (D-SQL), you can just as easily make it readable, but my main point is the importance of efficiency. A cursor may be considered by you to be more "readable" but that doesn't mean it performs well. A poorly written  but "well worded" cursor is a far worse option that well written but "poorly worded" D-SQL. Plus, D-SQL is often very easy to trouble shoot my simply changing the EXEC statement to a PRINT. You can see what SQL would be ran then, adjust that statement as needed, and then amend the D-SQL creation part.

    A "good" query is almost always going to put efficiency first before readability.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Agree with all of your points but I did qualify my statement by saying that there were other considerations if you need a generic solution. (i.e. compare any two records in any table - assuming a single field PK)

  • TheCTEGuy - Saturday, September 30, 2017 5:54 AM

    Thom A - Monday, September 18, 2017 3:42 AM

    This is a bit of a guess, based on your small dataset, but this, at least, might get the ball rolling:
    SELECT *,
           CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS Compare
    FROM test;

    Below is the script, probably the extension of above - 

    drop table test
    create table test
    (c1 int,
    c2 char(10),
    c3 varchar(100),
    c4 float
    )

    insert into test
    select 1,'A','Adam',9000
    union all
    select 2,'A','Adam',9000
    union all
    select 1,'B','Adam',9000
    union all
    select 1,'A','Jhon',9000
    union all
    select 2,'A','Adam',9001

    SELECT *,
         CASE WHEN LAG(c1,1,C1) OVER (PARTITION BY C2, C3, C4 ORDER BY C1) = c1 THEN 'Same' ELSE 'Differs' END AS CompareC1,
       CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS CompareC2,
         CASE WHEN LAG(c4,1,C4) OVER (PARTITION BY C1, C2, C3 ORDER BY C4) = c4 THEN 'Same' ELSE 'Differs' END AS CompareC4,
         CASE WHEN LAG(c3,1,C3) OVER (PARTITION BY C1, C2, C4 ORDER BY C3) = c3 THEN 'Same' ELSE 'Differs' END AS CompareC3
    FROM test
    order by c1,c2,c3,c4

    Hi TheCTEGuy,

    Thanks for the solution. But it is not working properly for below data. How can I fix it???

    drop table test
    create table test
    (c1 int,
    c2 char(10),
    c3 varchar(100),
    c4 float
    )

    truncate table test

    insert into test
    select 1,'A','Adam',9000
    union all
    select 1,'B','Adam',7000
    go

    select * from test;
    go

    SELECT *,
      CASE WHEN LAG(c1,1,C1) OVER (PARTITION BY C2, C3, C4 ORDER BY C1) = c1 THEN 'Same' ELSE 'Differs' END AS CompareC1,
      CASE WHEN LAG(c2,1,C2) OVER (PARTITION BY C1, C3, C4 ORDER BY C2) = c2 THEN 'Same' ELSE 'Differs' END AS CompareC2,
        CASE WHEN LAG(c3,1,C3) OVER (PARTITION BY C1, C2, C4 ORDER BY C3) = c3 THEN 'Same' ELSE 'Differs' END AS CompareC3,
      CASE WHEN LAG(c4,1,C4) OVER (PARTITION BY C1, C2, C3 ORDER BY C4) = c4 THEN 'Same' ELSE 'Differs' END AS CompareC4
    FROM test
    order by c1,c2,c3,c4

  • What results were you expecting?  Since no matter how you slice it, you have only one row in each partition, and you set the default value (the third parameter to the LAG function) to the current row value, then you're always going to get 'Same'.

    John

  • John Mitchell-245523 - Wednesday, October 25, 2017 5:20 AM

    What results were you expecting?  Since no matter how you slice it, you have only one row in each partition, and you set the default value (the third parameter to the LAG function) to the current row value, then you're always going to get 'Same'.

    John

    Hi John,
    Thanks for the response.

    Thing is, when I am comparing row-1 to row-2 , then the values for row 2 Compare values should be as follows.
    CompareC1 CompareC2 CompareC3 CompareC4
    Same    Differes   Same   Differs

    Can this made dynamic? and also at any point I am going to compare only 2 rows not more than that.

    Thanks,
    Sam

  • Sam

    So you're taking the whole set and comparing the value of each column in each row to the values of each column in the previous row?  The trouble is, your set doesn't have any intrinsic order - unless you have an ID or date column or something like that that you haven't shown us?  Try this - I've used C4 since the rows as you've shown them happen to be in the same order as the values in that column.

    SELECT *,
    CASE WHEN LAG(C1,1,C1) OVER (ORDER BY C4) = C1 THEN 'Same' ELSE 'Differs' END AS CompareC1,
    CASE WHEN LAG(C2,1,C2) OVER (ORDER BY C4) = C2 THEN 'Same' ELSE 'Differs' END AS CompareC2,
    CASE WHEN LAG(C3,1,C3) OVER (ORDER BY C4) = C3 THEN 'Same' ELSE 'Differs' END AS CompareC3,    
    CASE WHEN LAG(C4,1,C4) OVER (ORDER BY C4) = C4 THEN 'Same' ELSE 'Differs' END AS CompareC4
    FROM test

    John

  • You're probably looking for the following:CREATE TABLE #test (
        c1 int,
        c2 char(10),
        c3 varchar(100),
        c4 float
    );
    INSERT INTO #test (c1, c2, c3, c4)
        VALUES    (1,'A','Adam',9000),
                (1,'B','Adam',9000);

    WITH RAW_DATA AS (

        SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RN
        FROM #test
    )
    SELECT RN, c1, c2, c3, c4,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END
        END AS CompareC1,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END
        END AS CompareC2,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END
        END AS CompareC3,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END
        END AS CompareC4
    FROM RAW_DATA
    ORDER BY RN;

    DROP TABLE #test;

    The problem with what's been previous posted is that the use of the LAG function was invoking a 3rd parameter value that says if you can't get a previous value because such a record doesn't exist, just use the current record's value, but that is relying on a previous record, which won't exist for the first row.   See if you understand how my code works.   It's specifically designed for exactly 2 rows of data.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • John Mitchell-245523 - Wednesday, October 25, 2017 7:09 AM

    Sam

    So you're taking the whole set and comparing the value of each column in each row to the values of each column in the previous row?  The trouble is, your set doesn't have any intrinsic order - unless you have an ID or date column or something like that that you haven't shown us?  Try this - I've used C4 since the rows as you've shown them happen to be in the same order as the values in that column.

    SELECT *,
    CASE WHEN LAG(C1,1,C1) OVER (ORDER BY C4) = C1 THEN 'Same' ELSE 'Differs' END AS CompareC1,
    CASE WHEN LAG(C2,1,C2) OVER (ORDER BY C4) = C2 THEN 'Same' ELSE 'Differs' END AS CompareC2,
    CASE WHEN LAG(C3,1,C3) OVER (ORDER BY C4) = C3 THEN 'Same' ELSE 'Differs' END AS CompareC3,    
    CASE WHEN LAG(C4,1,C4) OVER (ORDER BY C4) = C4 THEN 'Same' ELSE 'Differs' END AS CompareC4
    FROM test

    John

    Thanks John.
    Yes, I am taking the whole set and comparing the value of each column in each row to the values of each column in the previous row.

  • sgmunson - Wednesday, October 25, 2017 10:40 AM

    You're probably looking for the following:CREATE TABLE #test (
        c1 int,
        c2 char(10),
        c3 varchar(100),
        c4 float
    );
    INSERT INTO #test (c1, c2, c3, c4)
        VALUES    (1,'A','Adam',9000),
                (1,'B','Adam',9000);

    WITH RAW_DATA AS (

        SELECT *, ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RN
        FROM #test
    )
    SELECT RN, c1, c2, c3, c4,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c1, 1, NULL) OVER (ORDER BY RN) = c1 THEN 'Same' ELSE 'Differs' END
        END AS CompareC1,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c2, 1, NULL) OVER (ORDER BY RN) = c2 THEN 'Same' ELSE 'Differs' END
        END AS CompareC2,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c3, 1, NULL) OVER (ORDER BY RN) = c3 THEN 'Same' ELSE 'Differs' END
        END AS CompareC3,
        CASE RN
            WHEN 1 THEN CASE WHEN LEAD(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END
            WHEN 2 THEN CASE WHEN LAG(c4, 1, NULL) OVER (ORDER BY RN) = c4 THEN 'Same' ELSE 'Differs' END
        END AS CompareC4
    FROM RAW_DATA
    ORDER BY RN;

    DROP TABLE #test;

    The problem with what's been previous posted is that the use of the LAG function was invoking a 3rd parameter value that says if you can't get a previous value because such a record doesn't exist, just use the current record's value, but that is relying on a previous record, which won't exist for the first row.   See if you understand how my code works.   It's specifically designed for exactly 2 rows of data.

    Thanks Steve. Thats exactly I was looking for.  At this point, I want to compare any 2 rows in a table.

    Many thanks.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply