Compare columns between 2 tables

Question

Compare columns between 2 tables

SSChampion

Points: 13195

January 24, 2021 at 8:41 pm

Create TABLE #TableA

(
 TaskNameA nvarchar(100),
 GovernA bit,
 ToolTypeA nvarchar(10),
 ResourceNameA nvarchar(100)
)


Create TABLE #TableB

(
 TaskNameB nvarchar(100),
 GovernB bit,
 ToolTypeB nvarchar(10),
 ResourceNameB nvarchar(100)
)


INSERT INTO #TableA (TaskNameA,GovernA,ToolTypeA,ResourceNameA )
SELECT 'Gap Follow Up', 1 , 'Tool' ,'Regional/Local' UNION ALL
SELECT 'Gap Extension', 1 , 'Excel' ,'Advanced Scoping' UNION ALL -- good
SELECT 'Gap Task', 0 , 'Excel' ,'Division / Function' UNION ALL
SELECT 'Validate master data', 1 , 'Excel'  ,NULL -- missing


INSERT INTO #TableB (TaskNameB,GovernB,ToolTypeB,ResourceNameB )
SELECT 'Gap Follow Up', 0 , 'Tool' ,'Advanced Scoping' UNION ALL --- Govern and ResourceName
SELECT 'Gap Extension', 1 , 'Excel' ,'Advanced Scoping' UNION ALL -- good
SELECT 'Gap Task', 0 , NULL  ,'Division / Function' UNION ALL
SELECT 'Investigate data', 1 , 'Excel'  ,NULL  --extra

SELECT * FROM #TableA

SELECT * FROM #TableB

--Desired results

SELECT 'Gap Follow Up' AS TaskNameA, 1 AS GovernA , 'Tool' AS ToolTypeA ,'Regional/Local'  AS ResourceNameA , 'Gap Follow Up'AS TaskNameB, 0 AS GovernB , 'Tool' AS ToolTypeB ,'Advanced Scoping' AS ResourceNameB ,'OK' AS Status_TaskName,'Govern Gap' AS Status_Govern,'OK' AS Status_ToolType,'Resource Gap' AS Status_ResourceName

UNION ALL

SELECT 'Gap Extension' AS TaskNameA, 1 AS GovernA , 'Excel' AS ToolTypeA ,'Advanced Scoping'  AS ResourceNameA , 'Gap Extension'AS TaskNameB, 1 AS GovernB , 'Excel' AS ToolTypeB ,'Advanced Scoping' AS ResourceNameB ,'OK' AS Status_TaskName,'OK' AS Status_Govern,'OK' AS Status_ToolType,'OK' AS Status_ResourceName

UNION ALL

SELECT 'Gap Task' AS TaskNameA, 0 AS GovernA , 'Excel' AS ToolTypeA ,'Division / Function'  AS ResourceNameA , 'Gap Task'AS TaskNameB, 0 AS GovernB , NULL AS ToolTypeB ,'Division / Function' AS ResourceNameB ,'OK' AS Status_TaskName,'OK' AS Status_Govern,'ToolType Gap' AS Status_ToolType,'OK' AS Status_ResourceName


UNION ALL

SELECT 'Validate master data' AS TaskNameA, 1 AS GovernA , 'Excel' AS ToolTypeA ,NULL  AS ResourceNameA , NULL AS TaskNameB, NULL AS GovernB , NULL AS ToolTypeB ,NULL AS ResourceNameB ,'Missing' AS Status_TaskName, NULL AS Status_Govern,NULL AS Status_ToolType,NULL AS Status_ResourceName


UNION ALL

SELECT NULL AS TaskNameA, NULL AS GovernA , NULL AS ToolTypeA ,NULL  AS ResourceNameA , 'Investigate data' AS TaskNameB, 1 AS GovernB , 'Excel' AS ToolTypeB ,NULL AS ResourceNameB ,'Remove' AS Status_TaskName, NULL AS Status_Govern,NULL AS Status_ToolType,NULL AS Status_ResourceName


DROP TABLE #TAbleA
DROP TABLE #TAbleB

Hi,

I have to compare data between 2 tables based on column names TaskNameA and TaskNameB .

Rule is when Task Name in both tables matches, then check if other column values also matches. If they are identical, the Status should be OK . If there is no match then flag them as gap creating a new gap column .

Thanks

-- Desired Results :

--For Task (Gap Follow Up), this task exists in both table so Status_TaskName = OK

--For Task (Gap Follow Up), GovernA and GovernB are not equeal so Status_Govern = Govern Gap

--For Task (Gap Follow Up), ToolTypeA and ToolTypeB are equeal so Status_ToolType = OK

--For Task (Gap Follow Up), ResourceNameA and ResourceNameB are not equeal so Status_ResourceName = Resource Gap

--For Task (Gap Extension), this task exists in both table so Status_TaskName = OK

--For Task (Gap Extension), GovernA and GovernB are equeal so Status_Govern = OK

--For Task (Gap Extension), ToolTypeA and ToolTypeB are equeal so Status_ToolType = OK

--For Task (Gap Extension), ResourceNameA and ResourceNameB are equeal so Status_ResourceName = OK

--For Task (Gap Task), this task exists in both table so Status_TaskName = OK

--For Task (Gap Task), GovernA and GovernB are equeal so Status_Govern = OK

--For Task (Gap Task), ToolTypeA and ToolTypeB are not equeal so Status_ToolType = ToolType Gap

--For Task (Gap Task), ResourceNameA and ResourceNameB are equeal so Status_ResourceName = OK'

--For Task (Validate data), this task does not exist in table B so Status_TaskName = 'Missing'

--If Task is Missing, then all other statuses should be blank or NULL

Create TABLE #TableA

(
 TaskNameA nvarchar(100),
 GovernA bit,
 ToolTypeA nvarchar(10),
 ResourceNameA nvarchar(100)
)


Create TABLE #TableB

(
 TaskNameB nvarchar(100),
 GovernB bit,
 ToolTypeB nvarchar(10),
 ResourceNameB nvarchar(100)
)


INSERT INTO #TableA (TaskNameA,GovernA,ToolTypeA,ResourceNameA )
SELECT 'Gap Follow Up', 1 , 'Tool' ,'Regional/Local' UNION ALL
SELECT 'Gap Extension', 1 , 'Excel' ,'Advanced Scoping' UNION ALL -- good
SELECT 'Gap Task', 0 , 'Excel' ,'Division / Function' UNION ALL
SELECT 'Validate master data', 1 , 'Excel'  ,NULL -- missing


INSERT INTO #TableB (TaskNameB,GovernB,ToolTypeB,ResourceNameB )
SELECT 'Gap Follow Up', 0 , 'Tool' ,'Advanced Scoping' UNION ALL --- Govern and ResourceName
SELECT 'Gap Extension', 1 , 'Excel' ,'Advanced Scoping' UNION ALL -- good
SELECT 'Gap Task', 0 , NULL  ,'Division / Function' UNION ALL
SELECT 'Investigate data', 1 , 'Excel'  ,NULL  --extra

SELECT * FROM #TableA

SELECT * FROM #TableB







SELECT 'Gap Follow Up' AS TaskNameA, 1 AS GovernA , 'Tool' AS ToolTypeA ,'Regional/Local'  AS ResourceNameA , 'Gap Follow Up'AS TaskNameB, 0 AS GovernB , 'Tool' AS ToolTypeB ,'Advanced Scoping' AS ResourceNameB ,'OK' AS Status_TaskName,'Govern Gap' AS Status_Govern,'OK' AS Status_ToolType,'Resource Gap' AS Status_ResourceName

UNION ALL

SELECT 'Gap Extension' AS TaskNameA, 1 AS GovernA , 'Excel' AS ToolTypeA ,'Advanced Scoping'  AS ResourceNameA , 'Gap Extension'AS TaskNameB, 1 AS GovernB , 'Excel' AS ToolTypeB ,'Advanced Scoping' AS ResourceNameB ,'OK' AS Status_TaskName,'OK' AS Status_Govern,'OK' AS Status_ToolType,'OK' AS Status_ResourceName

UNION ALL

SELECT 'Gap Task' AS TaskNameA, 0 AS GovernA , 'Excel' AS ToolTypeA ,'Division / Function'  AS ResourceNameA , 'Gap Task'AS TaskNameB, 0 AS GovernB , NULL AS ToolTypeB ,'Division / Function' AS ResourceNameB ,'OK' AS Status_TaskName,'OK' AS Status_Govern,'ToolType Gap' AS Status_ToolType,'OK' AS Status_ResourceName


UNION ALL

SELECT 'Validate master data' AS TaskNameA, 1 AS GovernA , 'Excel' AS ToolTypeA ,NULL  AS ResourceNameA , NULL AS TaskNameB, NULL AS GovernB , NULL AS ToolTypeB ,NULL AS ResourceNameB ,'Missing' AS Status_TaskName, NULL AS Status_Govern,NULL AS Status_ToolType,NULL AS Status_ResourceName


UNION ALL

SELECT NULL AS TaskNameA, NULL AS GovernA , NULL AS ToolTypeA ,NULL  AS ResourceNameA , 'Investigate data' AS TaskNameB, 1 AS GovernB , 'Excel' AS ToolTypeB ,NULL AS ResourceNameB ,'Remove' AS Status_TaskName, NULL AS Status_Govern,NULL AS Status_ToolType,NULL AS Status_ResourceName


DROP TABLE #TAbleA
DROP TABLE #TAbleB

--For Task (Investigate data), this task does not exist in table A so Status_TaskName = 'Remove'

--If Task doesn't exist in #TableA, then all other statuses should be blank or NULL

This topic was modified 3 years, 3 months ago by PSB.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Steve Collins SSCrazy Points: 2069 More actions · Answer 1

This comes really close. It uses a FULL JOIN which is functionally both a LEFT JOIN and a RIGHT JOIN. It seems the example Status_ToolType column might have some consistency issue.

select a.*, b.*,
       case when a.TaskNameA=b.TaskNameB then 'OK'
            when a.TaskNameA is not null then 'Missing'
            when a.TaskNameA is null then 'Remove' else null end Status_TaskName,
      case when a.GovernA=b.GovernB then 'OK'
           when a.GovernA<>b.GovernB then 'Govern Gap' else null end Status_Govern,
      case when a.ToolTypeA=b.ToolTypeB then 'OK'
           when a.ToolTypeA<>b.ToolTypeB then 'TootlType Gap' else null end Status_ToolType,
      case when a.ResourceNameA=b.ResourceNameB then 'OK'
           when a.ResourceNameA<>b.ResourceNameB then 'Resource Gap' else null end Status_ResourceName
from #TableA a
     full join #TableB b on a.TaskNameA=b.TaskNameB;

Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

PSB SSChampion Points: 13195 More actions · Answer 2

Instead of two rows it creates 4 records for 'Dual Task' , Not sure how to accomplish this

Create TABLE #TableA

(
 TaskNameA nvarchar(100),
 GovernA bit,
 ToolTypeA nvarchar(10),
 ResourceNameA nvarchar(100)
)


Create TABLE #TableB

(
 TaskNameB nvarchar(100),
 GovernB bit,
 ToolTypeB nvarchar(10),
 ResourceNameB nvarchar(100)
)


INSERT INTO #TableA (TaskNameA,GovernA,ToolTypeA,ResourceNameA )
SELECT 'Gap Follow Up', 1 , 'Tool' ,'Regional/Local' UNION ALL
SELECT 'Gap Extension', 1 , 'Excel' ,'Advanced Scoping' UNION ALL -- good
SELECT 'Gap Task', 0 , 'Excel' ,'Division / Function' UNION ALL
SELECT 'Validate master data', 1 , 'Excel'  ,NULL UNION ALL -- missing
SELECT 'Dual Task', 1 , 'Excel'  ,'Test' UNION ALL
SELECT 'Dual Task', 1 , 'Tool'  ,'Test'


INSERT INTO #TableB (TaskNameB,GovernB,ToolTypeB,ResourceNameB )
SELECT 'Gap Follow Up', 0 , 'Tool' ,'Advanced Scoping' UNION ALL --- Govern and ResourceName
SELECT 'Gap Extension', 1 , 'Excel' ,'Advanced Scoping' UNION ALL -- good
SELECT 'Gap Task', 0 , NULL  ,'Division / Function' UNION ALL
SELECT 'Investigate data', 1 , 'Excel'  ,NULL  UNION ALL--extra
SELECT 'Dual Task', 1 , 'Tool'  ,'Test' UNION ALL
SELECT 'Dual Task', 1 , 'Excel'  ,'Test'

SELECT * FROM #TableA

SELECT * FROM #TableB



DROP TABLE #TAbleA
DROP TABLE #TAbleB

Steve Jones - SSC Editor SSC Guru Points: 734449 More actions · Answer 3

What doesn't work? You say "this" , but what is this? The code from Steve seems to match your results.

Mike01 SSChampion Points: 11393 More actions · Answer 4

you changed the data in your second post. You get 4 rows for Dual Task because you are only joining TaskNameA to TaskNameB. There are 2 rows in each table, so each row will join to each other to create 4 rows. If you include ToolTypeA to ToolTypeB in the join, then it will return 2 rows

select a.*, b.*,

case when a.TaskNameA=b.TaskNameB then 'OK'

when a.TaskNameA is not null then 'Missing'

when a.TaskNameA is null then 'Remove' else null end Status_TaskName,

case when a.GovernA=b.GovernB then 'OK'

when a.GovernA<>b.GovernB then 'Govern Gap' else null end Status_Govern,

case when a.ToolTypeA=b.ToolTypeB then 'OK'

when a.ToolTypeA<>b.ToolTypeB then 'TootlType Gap' else null end Status_ToolType,

case when a.ResourceNameA=b.ResourceNameB then 'OK'

when a.ResourceNameA<>b.ResourceNameB then 'Resource Gap' else null end Status_ResourceName

from #TableA a

full join #TableB b on a.TaskNameA=b.TaskNameB

and a.ToolTypeA = b.ToolTypeB;

For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

PSB SSChampion Points: 13195 More actions · Answer 5

PSB

SSChampion

Points: 13195

January 25, 2021 at 10:41 pm

#3837387

Works . Thanks

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 6

The first thing I want to do is correct your DDL. You don't seem to know the table must have a key. This is not an option; it is a definition. Tables that deal with the same schema will probably have reference clauses between them instead of being independent card files like you've modeled. After 30 years I found out that most mistakes are made in the DDL, and then patched in kludge in the DML by bad programmers. I see you're also an old assembly language programmer who still writes with bit flags in the 21st century.

I see you also believe that things like “# table_x” are clear, meaningful names that define the entities modeled in that table. No, they are not. This is how people would have labeled a magnetic tape file in the 1960s. The name has nothing to do with what is modeled, just where and how it is stored.

The first thing to do is un-split these tables, Get rid of the excessive novels that prevent it from ever having a key in the attic. We might later want to change the data types, since I have the feeling that the choice of 100 character strings, etc was not carefully planned and researched.

CREATE TABLE Tasks

(task_name NVARCHAR(100) NOT NULL

task_source CHAR(1) NOT NULL

CHECK(task_source IN (‘A’, ‘B’)),

PRIMARY KEY (task_name, task_source),

govern BIT NOT NULL, – probably wrong attribute!

tool_type NVARCHAR(10) NOT NULL,

resource_name_ NVARCHAR(100) NOT NULL);

INSERT INTO Tasks

VALUES

('Gap Extension', ’A’, 1, 'Excel', 'Advanced Scoping' ),

('Gap Extension', ’B’, 1, 'Excel', 'Advanced Scoping' ),

('Gap Follow Up', ’A’, 1, 'Tool', 'Regional/Local' ),

('Gap Follow Up', ’B’, 0, 'Tool', 'Advanced Scoping' ),

('Gap Task', ’A’, 0, 'Excel', 'Division / Function' ),

('Gap Task', ’B’, 0, ‘foobar’ , 'Division / Function' ),

('Investigate data', ’B’,1, 'Excel', ‘extra’),

('Validate master data', ’A’,1, 'Excel', ‘missing’);

Why did you still use the old Sybase table UNION constructing syntax? Microsoft has had the ANSI/ISO standard syntax for quite a few years now.

The following select statement is totally nonrelational and violates the whole spirit of a tiered architecture! Why are you putting in headers and formatting your output display in a query? This is the whole point of client/server architecture! We have a presentation layer or tier that does the display work. This is not a matter of SQL, but the way we have been doing client/server architecture for about 30 years now.

/* begin crap.

SELECT 'Gap Follow Up' AS task_name_A, 1 AS GovernA, 'Tool' AS tool_type_, 'Regional/Local' ASresource_name_, 'Gap Follow Up'AS task_name_B, 0 AS GovernB, 'Tool' AS ToolTypeB, 'Advanced Scoping' AS ResourceNameB, 'OK' AS Status_task_name_, 'Govern Gap' AS Status_Govern, 'OK' AS Status_ToolType, 'Resource Gap' AS Status_ResourceName */

>> I have to compare data between 2 tables based on column names task_name_A and task_name_B . <<

Why are there two tables? Let me answer my own question; because you are modeling decks of punch cards or magnetic tape files and trying to merge them! I call this attribute splitting in my books, but Chris Date has also written about how bad this is. Essentially, you have taken something that should be in one table because it is one kind of entity or one kind of relationship and split it across multiple tables.

>> Rule is when Task_Name in both tables matches, then check if other column values also matches. <<

Did you know you can compare the rows of two tables with the INTERSECT operator? Of course, you should not have two tables with the same structure in the same schema.

I think you should start over and try a relational design instead of mimicking 1960s, magnetic tape files in SQL.

Please post DDL and follow ANSI/ISO standards when asking for help.

GaryV Hall of Fame Points: 3307 More actions · Answer 7

Joe, do you never stop to think that maybe what these people are posting are minimal examples to show where they have a question? Not the full data set in their production database - which most likely has meaningful names, keys, etc. It's very unlikely that a system has temporary tables that are created every time the user wants run a query.

Certainly you can continue to make your points about naming conventions, necessity of keys, duplicate data, etc. But you don't need to take a tone which excoriates users for their examples of data and code.

This reply was modified 3 years, 2 months ago by GaryV. Reason: typo

jcelko212 32090 SSCrazy Eights Points: 9294 More actions · Answer 8

Actually, I found you do need to yell at people a little bit especially when they are screwing up. By definition, a "minimal example" would include a key because that's the very definition of a table. When someone makes a fundamental error, like attribute splitting, you think they ought to get away with it? Your statement that you assume that their production database will have meaningful names and all the other things that make RDBMS work is optimistic.. Considering I earned a living for a few decades cleaning up crappy schemas, bad data names, and standards violations, I'm inclined to think you're too optimistic. Once you get in the habit, it's just as easy to write good SQL as it is to write bad SQL. But if no one will stop you from writing the bad code, how do you learn? Anytime you think I'm rough, you should have taken a programming class with Dykstra. 🙂

Please post DDL and follow ANSI/ISO standards when asking for help.