RE: Compare two tables: Which fields are different?

SSCommitted

Points: 1851

June 4, 2008 at 7:33 am

Let me try to briefly explain the solution I have. I apologize in advance for the lack of supporting info, but I do not have time to pretty it up and make it generic for public consumption. Also keep in mind that I am new to SQL Server, so give some slack on poor coding technique. Any suggestions for improvements, especially for performance, are welcome. More details can be provided if desired.

Ultimately I want a table, called Admin.Data_Changes, to be populated with one record for every field that was changed for the base table. It is designed for keeping the old and new values, but time has not permitted me to implement that part yet. This table is used by my users to audit changes from yesterday. So they have reports that query the Data_Changes table and report what they want.

CREATE TABLE [Admin].[Data_Changes] (

[table_name] [varchar] ( 256 )

,[action] [varchar] ( 10 )

,[key_fields] [varchar] ( 1000 )

,[key_values] [varchar] ( 1000 )

,[fieldname] [varchar] ( 256 )

,[old_value] [varchar] ( 255 )

,[new_value] [varchar] ( 255 )

,[dw_load_dttm] [datetime]NULL

)

I have a stored procedure which is passed the name of the base table and the comparison table. There are some basic edit checks to insure the table structures match, the tables exist and the base table has a primary key. Then I call another stored procedure which does all the hard work and populates a staging table with one record per modified record. The staging table has the structure ...

CREATE TABLE [Stage].[Data_Changes] (

[Table_Name] [varchar] (256) ,

[Action] [varchar] (10) ,

[Key_Fields] [varchar] (1000) ,

[Key_Values] [varchar] (1000) ,

[Changes] [varchar] (-1)

After the staging table is populated for the base table, the final table is populated. First any previous records for this table are removed and then the staging data is parsed into one record for each changed field. @Table1 is the base table.

-- Clear out any previous audit records

DELETE FROM Admin.Data_Changes WHERE Table_Name = @Table1

-- Parse the staging records into one record for each field changed

INSERT INTO Admin.Data_Changes (Table_name, Action, Key_fields, Key_values, Fieldname, DW_load_dttm)

SELECT Table_name, Action, Key_fields, Key_values,

SUBSTRING('~'+ Changes +'~',N+1,CHARINDEX('~','~'+ Changes +'~',N+1)-N-1) AS Fieldname,

getdate()

FROM Admin.Tally a CROSS JOIN Stage.Data_Changes b

WHERE a.N < LEN('~'+ Changes +'~')

AND SUBSTRING('~'+ Changes +'~',N,1) = '~'

AND SUBSTRING('~'+ Changes +'~',N+1,1) <> '~'

AND table_name = @Table1

Now for the hard working stored procedure. The setup ...

CREATE PROCEDURE [Admin].[Data_Changes_Staging]

@p_base_tablename varchar(256),

@p_comp_tablename varchar(256),

@p_details varchar(256)

AS

set nocount on

BEGIN

DECLARE

@Table1 varchar(256),

@Schema1 varchar(256),

@Table2 varchar(256),

@Schema2 varchar(256),

@Key_Fields varchar(max),

@Key_Values varchar(max),

@Changes varchar(max),

@Where varchar(max),

@From varchar(max),

@SQL varchar(max),

@i integer ;

BEGIN TRY

SELECT @Table1 = PARSENAME(@p_base_tablename, 1),

@Schema1 = PARSENAME(@p_base_tablename, 2),

@Table2 = PARSENAME(@p_comp_tablename, 1),

@Schema2 = PARSENAME(@p_comp_tablename, 2)

DELETE FROM Stage.Data_Changes WHERE Table_Name = @Table1

The final dynamic SQL will be an INSERT statement into the staging table, Stage.Data_Changes. The pieces of that INSERT statement are built dynamically from the columns and constraints system metadata tables. First, piece together the FROM clause.

SET @From = ' FROM ' + @p_base_tablename + ' A, ' + @p_comp_tablename + ' B '

Then the table Information_Schema.Table_Constraints is used to get the columns used for the primary key. This info is used to build the WHERE clause that will join the two tables. It is also used to populate two of the fields in the table. @Key_Fields is the primary key field names strung together. @Key_Values is the values of those primary key fields, strung together. Note that the values are converted to varchar so they can be strung together.

SELECT

@Where =

stuff( ( select ' and A.' + k1.column_name + ' = B.' + k1.column_name

from information_schema.key_column_usage k1

where k1.constraint_name = c.constraint_name

order by ordinal_position

for xml path('') )

, 1,4,' WHERE '),

@Key_Fields =

stuff( ( select '~' + k1.column_name

from information_schema.key_column_usage k1

where k1.constraint_name = c.constraint_name

order by ordinal_position

for xml path('') )

, 1,1,'') ,

@Key_Values =

stuff( ( select '+ ''~'' + convert(varchar,A.' + k1.column_name + ')'

from information_schema.key_column_usage k1

where k1.constraint_name = c.constraint_name

order by ordinal_position

for xml path('') )

, 1,8,'')

FROM INFORMATION_SCHEMA.Table_Constraints c

WHERE c.table_schema = @Schema1

AND c.table_name = @Table1

AND c.constraint_type = 'PRIMARY KEY'

GROUP BY constraint_name

The only thing left is figuring out all the changes in the fields. The table Information_Schema.Columns is used to dynamically get all the columns for the tables. For each column, a CASE statement is built. When the two columns are equal it simply concatenates the delimiter to the string and if they are not equal, then it concatenates the name of the column and the delimiter. At this time, it does not give the values from each table, but that will be added to this statement, with changes only to the string and not the logic. Note a couple things when stringing @Changes. In order to handle all data types and nulls, the columns are converted to varchar and set to '' if null. Also note that in my example, I have excluded some column names that may appear in my tables that I am not worried about changing. The load dates and times are expected to be different from yesterday and today, so no point checking those and I do not want to know if someone's age changed because today is their birthday. If the Birthdate is changed, then I will pick that up. This is completely optional for this logic to work.

SELECT @Changes =

stuff( ( select '+ case when isnull(convert(varchar,A.' + c1.column_name +

'),'''') = isnull(convert(varchar,B.' + c1.column_name +

'),'''') then ''~'' else ''' + c1.column_name + '~'' end '

from information_schema.columns c1

where c1.column_name not in ('LOAD_DTTM','LASTUPDDTTM','DW_LOAD_DTTM','UW_LOAD_DTTM','AGE')

and c1.table_schema = c2.table_schema

and c1.table_name = c2.table_name for xml path('') )

, 1,1,'')

FROM information_schema.columns c2

WHERE table_schema = @Schema1 and table_name = @Table1

GROUP BY c2.table_schema, table_name

That's it! Piece all this together and clean it up and you can find out what fields changed when comparing any two tables.