Using "Multiple Assignment Variables" for a Generic Compare Data Script

I recently read about "multiple assignment variables" in Paul Nielsen's SQL Server 2008 Bible. According to what I read, this is an essentially undocumented feature of SQL Server which has been available since SQL 2000. Indeed, I checked my tattered copy of the SQL Server 2000 Bible, and Paul describes the technique there under the topic "Recursive SELECT Variables". You can read more about it in SQL Server 2008 on Paul's Blog.

The technique lets one "roll up" a column of values in a table into a variable, without using a cursor. Suppose, for example, you wanted to create a comma-delimited list out of the values in a column of a table -- this technique will let you do that, while avoiding the need for a cursor to loop over those values.

Now, in a previous post of mine discussing comparing tables using EXCEPT <reference>, I noted that it is best practice to explicitly name your columns in any query using EXCEPT. I recently needed to do just this, compare many dimension tables between Staging and Test databases, while testing my work migrating a dimension update process from DTS to SSIS. Since I subscribe to Larry Wall's programmer virtues of laziness, impatience, and hubris, I was looking for a way to do this without writing an explicit, separate script for all those dimension tables.

To make one generic script to compare any given table while being explicit with the columns, I knew I was going to have to get the column names out of the INFORMATION_SCHEMA views, and then compose some dynamic SQL. Pretty straightforward, but the prospect of using a cursor to create the column-delimited list of column names -- so I could be explicit in my EXCEPT query -- was giving me pause. That's when I turned to the (SQL Server 2008) Bible and found the solution to my problem.

Here's how it works; first I will describe the MAV syntax, and then show how I used it in my generic compare script. The MAV syntax is

SELECT @var = @var + col FROM (your query)

In my case I wanted to create a string variable containing a comma-delimited list of column names:

select @col_list = @col_list + col + ', '
 from (select '[' + column_name + ']' as col
 from information_schema.columns 
 where table_name = @tbl_name) d 
 order by col

This leaves a trailing comma, which is not going to go over very well in my final SQL, so let's trim it off:

select @col_list = left(@col_list, len(@col_list)-1)

Doing this on my Widget table, for example gives us this:

[WIDGET_ID], [WIDGET_NAME], [WIDGET_DESCRIPTION]

And that's the main thing we need to dynamically form the explicit EXCEPT query to compare my two dimension table in my test and Staging databases. Here's the rest of the script, as I use it in my work. In order to compare any given table, just change the value passed to @tbl_name. In practice I do this via a parameter substitution in an SSIS task which loops over my dimension tables.

--
-- generic compare two tables
--
declare @db1 varchar(256), @db2 varchar(256)
declare @svr1 varchar(256), @svr2 varchar(256)
declare @tbl_name varchar(256)
declare @col_list varchar(max)
declare @left_table varchar(256), @right_table varchar(256)
--
-- change this as needed
--
select @tbl_name = 'WIDGET'

 select @db1 = 'STAGING_DB', @db2 = 'TEST_DB'
select @svr1 = 'STAGING_SVR', @svr2 = 'TEST_SVR'
select @left_table = '[' + @svr1 + '].[' + @db1 + '].[dbo].[' + @tbl_name + ']'
select @right_table = '[' + @svr2 + '].[' + @db2 + '].[dbo].[' + @tbl_name + ']'
print 'Comparing ' + @left_table + ' to ' + @right_table
--
-- use MAV to form column list
--
select @col_list = ''
select @col_list = @col_list + col + ', '
 from (select '[' + column_name + ']' as col
 from information_schema.columns 
 where table_name = @tbl_name) d 
 order by col
select @col_list = left(@col_list, len(@col_list)-1)
declare @sql varchar(max), @top_sql varchar(max), @bot_sql varchar(max)

--
-- now form EXCEPT query to compare tables
--
select @top_sql = 'SELECT ' + @col_list + ' FROM ' + @left_table
 + ' EXCEPT ' 
 + 'SELECT ' + @col_list + ' FROM ' + @right_table
select @bot_sql = 'SELECT ' + @col_list + ' FROM ' + @right_table
 + ' EXCEPT ' 
 + 'SELECT ' + @col_list + ' FROM ' + @left_table
select @sql = 
 'SELECT ''' + @left_table + ''' as src, * FROM (' + @top_sql + ') top_results '
 + ' UNION '
 + 'SELECT ''' + @right_table + ''' as src, * FROM (' + @bot_sql + ') bot_results '

--
-- get primary key, if any, for ordering, using MAV again
--
declare @orderby_cols varchar(max)
select @orderby_cols = ''
select @orderby_cols = @orderby_cols + col + ', '
 from (select '[' + column_name + ']' as col, c.ordinal_position 
 from information_schema.table_constraints tc 
 inner join information_schema.key_column_usage c
 on tc.constraint_name = c.constraint_name 
 where tc.constraint_type = 'PRIMARY KEY' and tc.table_name = @tbl_name ) d
 order by ordinal_position
select @orderby_cols = @orderby_cols + ' src '
select @sql = @sql + ' ORDER BY ' + @orderby_cols
--
-- run the query
--
print @sql
exec (@sql)

Note that this script is not meant to be an all-purpose table comparison script, it is just meant to tell me if the contents of two tables that I know to have the same structure, are different. This assumption is explicit in the code at the bottom of the script which uses the MAV technique again to get the columns used in the primary key of the table, the assumption being that the primary key of the two tables is the same. I order by the primary key, and then 'src', in order to put rows with differences next to each other in the result set.

I don't know how I missed learning about "multiple assignment variables" for so long, and have found it very useful in my work. I hope that you too will realize some efficiencies in your work by exposure to this technique.

Using "Multiple Assignment Variables" for a Generic Compare Data Script

Rate

Share

Categories

Share

Rate

Using "Multiple Assignment Variables" for a Generic Compare Data Script

Rate

Share

Categories

Share

Rate

Related content

A Normalization Primer

Reaching the Outer Limits

Working with email addresses in SQL Server!

Querying Responses

Storing Checkbox Responses as Integers

Cookies on SQLServerCentral