Compare two tables: Which fields are different?

Question

Post reply

Compare two tables: Which fields are different?

mikeg13

SSCommitted

Points: 1851
More actions
May 21, 2008 at 5:43 pm

#190885

I do not want to reinvent the wheel, but I need to write a procedure that can be passed two table names and it will compare the tables and list the fields that are different. It can also show the old and new values. The assumptions are that the table structures are the same and there is an index on the base table, which would indicate what fields to join by. I have two Oracle scripts that handle this, but I need the same functionality in SQL Server 2005. Does anyone know of a script that does such a detailed comparison of two tables and does it generically with any two tables passed into the procedure?

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply

SQL ORACLE One Orange Chip Points: 27807 More actions · Answer 1

SQL ORACLE

One Orange Chip

Points: 27807

May 21, 2008 at 8:58 pm

#818964

In SQL 2005, you can check sys.syscolumns table.

mikeg13 SSCommitted Points: 1851 More actions · Answer 2

I believe the columns metadata would be used, but I am not looking for differences in the table structures. I am looking to compare the data in the two tables. But I do need to know the name of the fields where differences occur.

Anam Verma SSCertifiable Points: 5365 More actions · Answer 3

I have two Oracle scripts that handle this, but I need the same functionality in SQL Server 2005.

Can you please upload Oracle scripts, can have an idea what it is doing and could try to map it to SQL Server 2k5.

raja_saminathan SSCommitted Points: 1607 More actions · Answer 4

Hi,

Look at the Article, this will help you in passing the tables as parameters into a stored procedure.

http://www.sqlservercentral.com/articles/Stored+Procedures/2977/

Rajesh

mikeg13 SSCommitted Points: 1851 More actions · Answer 5

Rather than posting a very long Oracle procedure, I will just describe what the Oracle procedure does. First, let me say my solution is derived from an elegant solution from an article by Sanjay Ray (http://www.oracle.com/technology/oramag/code/tips2004/012504.html). I was hoping there was an equivalent SQL Server elegant solution that I could adapt. If not, then perhaps this will make a very good article for someone when I figure it out. 🙂

Problem: I have a copy of TableA from yesterday and today. I want to know specifically what changed. The adds and deletes are simple, using the key and the EXCEPT and UNION ALL operators (see http://weblogs.sqlteam.com/jeffs/archive/2007/05/02/60194.aspx). For modifications, I want to know the key for the modified records and then the fields that were changed as well as the old and new values. This needs to be in a generic stored procedure so I can audit any two tables.

Solution: The trick is building a huge dynamic SQL statement. Pass the proc two tablenames, one of which is the "base" table for the comparison. The key structure is determined by looking up the fields in the metadata for indexes for the base table. If there is no index (there should be), the first column is used. The two tables will be joined by the key fields and every other field will be compared.

Using the list of columns from the metadata (INFORMATION_SCHEMA.Columns in SS), the query is built for the non-key fields. The SELECT list is built dynamically from the metadata into a concatenated list for each field. Each field will end up looking something like :

CASE

WHEN A.column_name = B.column_name THEN null

ELSE 'column_name=['+A.column_name+'->'+B.column_name+']'

END

There is a delimiter (~) between each field in the SELECT list. So if the fields are equal the null will result in ~~ and if they are unequal, then the name of the column and the values from each table are put between the delimiters.

The dynamic SQL is actually an INSERT statement into another table. This table is then parsed to find the data between the delimiters and report the field names and the old and new values.

Voila! I hope any of this made sense. As I work through each of the processes, I may return with more detailed questions. Right now I was just hoping someone else had already published a solution for SQL Server.

John Rowan SSC Guru Points: 56440 More actions · Answer 6

You may want to search SSC for a solution. I know for a fact that this is not the first time your question has been asked in these forums.

You may want to consider a more formal auditing technique if it is feasable. An auditing procedure like what you've suggested will be a very poor performer on large tables as it will have to do a column by column comparison. At the very least, you may want to look at using a timestamp or datetime column to narrow the search down to only those rows where you know for sure changes have been made.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

jaypee_s Right there with Babe Points: 793 More actions · Answer 7

Hi

You can try like this. i guess it will solve your problem

select distinct a.* from INFORMATION_SCHEMA.COLUMNS a

where a.column_name not in (select column_name from INFORMATION_SCHEMA.COLUMNS b

where b.table_name in ('stp_rule_dtl') ) ---- Table A

and a.table_name in ('stp_rule_esc_dtl') --- Table B

order by a.column_name

Thanks

jaypee.s

Raging Shui Valued Member Points: 71 More actions · Answer 8

How about this...

create procedure CompareTables @table1 varchar(100), @table2 varchar(100)

as

declare @sql varchar(8000)

set @sql = 'select ''' + @table1 + ''' as tblName, * from

(select * from ' + @table1 + '

except

select * from ' + @table2 + ') x

union all

select ''' + @table2 + ''' as tblName, * from

(select * from ' + @table2 + '

except

select * from ' + @table1 +') x'

exec(@sql)

mikeg13 SSCommitted Points: 1851 More actions · Answer 9

Let me try to briefly explain the solution I have. I apologize in advance for the lack of supporting info, but I do not have time to pretty it up and make it generic for public consumption. Also keep in mind that I am new to SQL Server, so give some slack on poor coding technique. Any suggestions for improvements, especially for performance, are welcome. More details can be provided if desired.

Ultimately I want a table, called Admin.Data_Changes, to be populated with one record for every field that was changed for the base table. It is designed for keeping the old and new values, but time has not permitted me to implement that part yet. This table is used by my users to audit changes from yesterday. So they have reports that query the Data_Changes table and report what they want.

CREATE TABLE [Admin].[Data_Changes] (

[table_name] [varchar] ( 256 )

,[action] [varchar] ( 10 )

,[key_fields] [varchar] ( 1000 )

,[key_values] [varchar] ( 1000 )

,[fieldname] [varchar] ( 256 )

,[old_value] [varchar] ( 255 )

,[new_value] [varchar] ( 255 )

,[dw_load_dttm] [datetime]NULL

)

I have a stored procedure which is passed the name of the base table and the comparison table. There are some basic edit checks to insure the table structures match, the tables exist and the base table has a primary key. Then I call another stored procedure which does all the hard work and populates a staging table with one record per modified record. The staging table has the structure ...

CREATE TABLE [Stage].[Data_Changes] (

[Table_Name] [varchar] (256) ,

[Action] [varchar] (10) ,

[Key_Fields] [varchar] (1000) ,

[Key_Values] [varchar] (1000) ,

[Changes] [varchar] (-1)

After the staging table is populated for the base table, the final table is populated. First any previous records for this table are removed and then the staging data is parsed into one record for each changed field. @Table1 is the base table.

-- Clear out any previous audit records

DELETE FROM Admin.Data_Changes WHERE Table_Name = @Table1

-- Parse the staging records into one record for each field changed

INSERT INTO Admin.Data_Changes (Table_name, Action, Key_fields, Key_values, Fieldname, DW_load_dttm)

SELECT Table_name, Action, Key_fields, Key_values,

SUBSTRING('~'+ Changes +'~',N+1,CHARINDEX('~','~'+ Changes +'~',N+1)-N-1) AS Fieldname,

getdate()

FROM Admin.Tally a CROSS JOIN Stage.Data_Changes b

WHERE a.N < LEN('~'+ Changes +'~')

AND SUBSTRING('~'+ Changes +'~',N,1) = '~'

AND SUBSTRING('~'+ Changes +'~',N+1,1) <> '~'

AND table_name = @Table1

Now for the hard working stored procedure. The setup ...

CREATE PROCEDURE [Admin].[Data_Changes_Staging]

@p_base_tablename varchar(256),

@p_comp_tablename varchar(256),

@p_details varchar(256)

AS

set nocount on

BEGIN

DECLARE

@Table1 varchar(256),

@Schema1 varchar(256),

@Table2 varchar(256),

@Schema2 varchar(256),

@Key_Fields varchar(max),

@Key_Values varchar(max),

@Changes varchar(max),

@Where varchar(max),

@From varchar(max),

@SQL varchar(max),

@i integer ;

BEGIN TRY

SELECT @Table1 = PARSENAME(@p_base_tablename, 1),

@Schema1 = PARSENAME(@p_base_tablename, 2),

@Table2 = PARSENAME(@p_comp_tablename, 1),

@Schema2 = PARSENAME(@p_comp_tablename, 2)

DELETE FROM Stage.Data_Changes WHERE Table_Name = @Table1

The final dynamic SQL will be an INSERT statement into the staging table, Stage.Data_Changes. The pieces of that INSERT statement are built dynamically from the columns and constraints system metadata tables. First, piece together the FROM clause.

SET @From = ' FROM ' + @p_base_tablename + ' A, ' + @p_comp_tablename + ' B '

Then the table Information_Schema.Table_Constraints is used to get the columns used for the primary key. This info is used to build the WHERE clause that will join the two tables. It is also used to populate two of the fields in the table. @Key_Fields is the primary key field names strung together. @Key_Values is the values of those primary key fields, strung together. Note that the values are converted to varchar so they can be strung together.

SELECT

@Where =

stuff( ( select ' and A.' + k1.column_name + ' = B.' + k1.column_name

from information_schema.key_column_usage k1

where k1.constraint_name = c.constraint_name

order by ordinal_position

for xml path('') )

, 1,4,' WHERE '),

@Key_Fields =

stuff( ( select '~' + k1.column_name

from information_schema.key_column_usage k1

where k1.constraint_name = c.constraint_name

order by ordinal_position

for xml path('') )

, 1,1,'') ,

@Key_Values =

stuff( ( select '+ ''~'' + convert(varchar,A.' + k1.column_name + ')'

from information_schema.key_column_usage k1

where k1.constraint_name = c.constraint_name

order by ordinal_position

for xml path('') )

, 1,8,'')

FROM INFORMATION_SCHEMA.Table_Constraints c

WHERE c.table_schema = @Schema1

AND c.table_name = @Table1

AND c.constraint_type = 'PRIMARY KEY'

GROUP BY constraint_name

The only thing left is figuring out all the changes in the fields. The table Information_Schema.Columns is used to dynamically get all the columns for the tables. For each column, a CASE statement is built. When the two columns are equal it simply concatenates the delimiter to the string and if they are not equal, then it concatenates the name of the column and the delimiter. At this time, it does not give the values from each table, but that will be added to this statement, with changes only to the string and not the logic. Note a couple things when stringing @Changes. In order to handle all data types and nulls, the columns are converted to varchar and set to '' if null. Also note that in my example, I have excluded some column names that may appear in my tables that I am not worried about changing. The load dates and times are expected to be different from yesterday and today, so no point checking those and I do not want to know if someone's age changed because today is their birthday. If the Birthdate is changed, then I will pick that up. This is completely optional for this logic to work.

SELECT @Changes =

stuff( ( select '+ case when isnull(convert(varchar,A.' + c1.column_name +

'),'''') = isnull(convert(varchar,B.' + c1.column_name +

'),'''') then ''~'' else ''' + c1.column_name + '~'' end '

from information_schema.columns c1

where c1.column_name not in ('LOAD_DTTM','LASTUPDDTTM','DW_LOAD_DTTM','UW_LOAD_DTTM','AGE')

and c1.table_schema = c2.table_schema

and c1.table_name = c2.table_name for xml path('') )

, 1,1,'')

FROM information_schema.columns c2

WHERE table_schema = @Schema1 and table_name = @Table1

GROUP BY c2.table_schema, table_name

That's it! Piece all this together and clean it up and you can find out what fields changed when comparing any two tables.

mikeg13 SSCommitted Points: 1851 More actions · Answer 10

Ok, I could not let the lack of old and new values wait any longer. Here is the revised code that accounts for whether the old and new values should be included in the results. The @p_details parameter is passed to the procedure and if it is 'YES' then the old and new values are included. If not YES, then it just gives the name of the field that differs. Note that the values are enclosed in [], delimited by ^^ (my original -> gave me problems due to 'xml path' I think). Let me just say that creating dynamic SQL is a PITA.

IF UPPER(@p_details) = 'YES'

SELECT @Changes =

stuff( ( select '+ case when isnull(convert(varchar,A.' + c1.column_name +

'),'''') = isnull(convert(varchar,B.' + c1.column_name +

'),'''') then ''~'' else ''' + c1.column_name +

'=[''+isnull(convert(varchar,A.' + c1.column_name +

'),'''') + ''^^'' +isnull(convert(varchar,B.' + c1.column_name +

'),'''') + '']~'' end '

from information_schema.columns c1

where c1.column_name not in ('LOAD_DTTM','LASTUPDDTTM','DW_LOAD_DTTM','UW_LOAD_DTTM','AGE')

and c1.table_schema = c2.table_schema

and c1.table_name = c2.table_name for xml path('') )

, 1,1,'')

FROM information_schema.columns c2

WHERE table_schema = @Schema1 and table_name = @Table1

GROUP BY c2.table_schema, table_name

ELSE

SELECT @Changes =

stuff( ( select '+ case when isnull(convert(varchar,A.' + c1.column_name +

'),'''') = isnull(convert(varchar,B.' + c1.column_name +

'),'''') then ''~'' else ''' + c1.column_name + '~'' end '

from information_schema.columns c1

where c1.column_name not in ('LOAD_DTTM','LASTUPDDTTM','DW_LOAD_DTTM','UW_LOAD_DTTM','AGE')

and c1.table_schema = c2.table_schema

and c1.table_name = c2.table_name for xml path('') )

, 1,1,'')

FROM information_schema.columns c2

WHERE table_schema = @Schema1 and table_name = @Table1

GROUP BY c2.table_schema, table_name

mikeg13 SSCommitted Points: 1851 More actions · Answer 11

Just for completeness, here is the revised code for parsing out the old ane new values into the final table...

IF UPPER(@p_details) = 'YES'

INSERT INTO Admin.Data_Changes (Table_name, Action, Key_fields, Key_values, Fieldname, Old_Value, New_Value, DW_load_dttm)

SELECT Table_name, Action, Key_fields, Key_values,

SUBSTRING('~'+ Changes +'~',N+1,CHARINDEX('=','~'+ Changes +'~',N+1)-N-1) AS Fieldname,

SUBSTRING(Changes,CHARINDEX('[',Changes,N+1)+1,( CHARINDEX('^^',Changes,N+1)-CHARINDEX('[',Changes,N+1)-1) ) AS Old_Value,

SUBSTRING(Changes,CHARINDEX('^^',Changes,N+1)+2,( CHARINDEX(']',Changes,N+1)-CHARINDEX('^^',Changes,N+1)-2) ) AS New_Value,

getdate()

FROM Admin.Tally a CROSS JOIN Stage.Data_Changes b

WHERE a.N < LEN('~'+ Changes +'~')

AND SUBSTRING('~'+ Changes +'~',N,1) = '~'

AND SUBSTRING('~'+ Changes +'~',N+1,1) <> '~'

AND table_name = @Table1

ELSE

INSERT INTO Admin.Data_Changes (Table_name, Action, Key_fields, Key_values, Fieldname, Old_Value, New_Value, DW_load_dttm)

SELECT Table_name, Action, Key_fields, Key_values,

SUBSTRING('~'+ Changes +'~',N+1,CHARINDEX('~','~'+ Changes +'~',N+1)-N-1) AS Fieldname,

null AS Old_Value,

null AS New_Value,

getdate()

FROM Admin.Tally a CROSS JOIN Stage.Data_Changes b

WHERE a.N < LEN('~'+ Changes +'~')

AND SUBSTRING('~'+ Changes +'~',N,1) = '~'

AND SUBSTRING('~'+ Changes +'~',N+1,1) <> '~'

AND table_name = @Table1

Seggerman-675349 Hall of Fame Points: 3563 More actions · Answer 12

I was given a similar task a few months ago and my solution was along the lines of the suggestion of glen.greaves -

identify the records with mismatches by using

select table a

union all select table b

except

select table b

union all

select table a

I had a function which took two fields and the name of the field and returned a value if the fields didn't match

field by field of the records that didn't match I looked for the field(s) with mismatches

oh and the first step was to limit the rows in table a and table b to records that matched on key fields

mikeg13 SSCommitted Points: 1851 More actions · Answer 13

The UNION ALL and EXCEPT method will work and is easier to understand, but it is too time consuming for me. If you use 'select *' with the EXCEPT, you will get the different records, but have no idea what columns changed. However, you could loop through the syscolumns table and create dynamic sql to use this method for each column. Like I said, this is easier to understand and certainly easier and quicker to code. Using this method requires reading the tables once for every column in the table. My method checks everything in one single pass. With some tables with hundreds of columns, I prefer my method (but am always looking for a quicker and easier method).

viet074 Valued Member Points: 67 More actions · Answer 14

Hi mikeg13. I was going through your sample as I need to implement something similar. I was wondering if you could answer a couple of questions.

1. I could not locate anywhere in the code sample where you actually inserted into the staging table. Did I miss something?

2. Where does the Admin.Tally table get created and populated?