Using "Multiple Assignment Variables" for a Generic Compare Data Script

Question

Using "Multiple Assignment Variables" for a Generic Compare Data Script

zintp

SSChasing Mays

Points: 614
More actions
January 25, 2010 at 9:47 pm

#386177

Comments posted to this topic are about the item Using "Multiple Assignment Variables" for a Generic Compare Data Script

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

Paul White SSC Guru Points: 150467 More actions · Answer 1

Not only is it important to name columns (as opposed to using '*') - it is important to name them and return them in the same order!

Nothing in the 'MAV' query guarantees the order of column names returned. Use FOR XML PATH with an explicit ORDER BY clause instead (it's faster and entirely documented):

SELECT STUFF(

(

SELECT N',' + sc.name

FROM sys.columns sc

WHERE sc.[object_id] = OBJECT_ID(N'dbo.Widget', N'U')

ORDER BY

sc.column_id ASC

FOR XML PATH(''), TYPE

).value(N'.[1]', 'NVARCHAR(MAX)'), 1, 1, '');

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

vince.iacoboni@db.com Hall of Fame Points: 3965 More actions · Answer 2

A quick note on that trailing comma. Instead of

select @col_list = @col_list + col + ', '

which leaves the trailing comma that must be taken off, you might try

set @col_list = null -- though defaults to null when defined...

select @col_list = nullif(@col_list + ',', '') + col

Bradley Deem SSCrazy Points: 2565 More actions · Answer 3

I can see where this could come in handly during ETL processes.

In regards to creating a delimited list see the example here on MSDN http://msdn2.microsoft.com/en-us/library/ms131056.aspx. Using a CLR it will create a comma delimited list. Since it's an Aggregate Function you can use it just like any other aggregate which makes it really handy.

yaaadman SSC Enthusiast Points: 103 More actions · Answer 4

Hi Stephen,

I was wondering if there's a reason you use information_schema rather than the new sys.columns? Backwards compatibility? I believe in 2005 onwards information_schema is just a wrapper for the new object views (sys.columns, sys.objects, etc.)...

pbarbin Old Hand Points: 362 More actions · Answer 5

Thanks for the article, I use this "feature" quite a bit. One of mine is similar to yours, I have a script that creates insert statements using column names in order to reduce the size of our production database for testing purposes. We copy over only the records we want from the production db into an empty db, the process is much faster than deleting and resizing the db.

I realize that your article is just for explanation purposes, but when I compare the contents of two tables that have the same structure, I use the CheckSum_Agg() and CheckSum() functions. You can start by comparing the entire tables with just one checksum value. If the values don't match, you can show the rows that differ by comparing each row's checksum with the checksum of the corresponding tables row (using primary keys to match rows).

Syntax is:

SELECT CheckSum_Agg(CheckSum(*)) FROM TableName

SELECT Checksum(*) FROM TableName WHERE PrimaryKey=1

Paul

stevemc Right there with Babe Points: 747 More actions · Answer 6

Nice article. A minor nit-pick. You need to initialize the @col variable to an empty string befory running the MAV query. Otherwise, your result will be NULL.

I've also used this for quick and dirty CSV list generation. For long lists, though, say over 500 elements, a FOR XML query, like Paul White's above, is much faster.

john.moreno Default port Points: 1485 More actions · Answer 7

Now, in a previous post of mine discussing comparing tables using EXCEPT <reference>, I noted that it is best practice to explicitly name your columns in any query using EXCEPT.

I saw this in your previous article, but didn't respond to it there...I think you're wrong, and that this is one of the rare instances where it's best NOT to explicitly name your columns. If the column order is different between the two tables, then there may be other differences as well. Of course there can be other differences without the column order being different or the column order could be the only difference, but since it's simpler to use * and the failure provides you with useful information (that the structure of the two tables has branched at some point), that's what I'd use. Of course you may need to do the comparision in any case, in which case todays script can come in handy.

Speaking of todays script, using

@col_list = IsNull(@col_list +',' , '') + col

prevents having to delete a trailing comma.

carl.anderson-1037280 Mr or Mrs. 500 Points: 522 More actions · Answer 8

If the purpose of this script is to detect whether two tables with identical structure have different data, it might be nice to check the row counts before jumping into a potentially expensive comparison.

Just a thought.

Carl Anderson

Data Architect, Northwestern University

Paul White SSC Guru Points: 150467 More actions · Answer 9

@viacoboni: a construction using COALESCE is normally used, but in any case I can see no reason to prefer any variation of the method over the FOR XML solution (documented, supported, and faster...)

@Bradley Deem: True - and I am a great fan of appropriate CLR usage; however, FOR XML still performs faster than even a CLR UDA - so a call-out to the hosted CLR seems unnecessary...unless you have special requirements of course!

@yaadman: Absoutely - and I was going to include a quote from BOL to make the point that system views are preferred over INFORMATION_SCHEMA...but I didn't want to pile on too much 😉 I agree that it is rarely better to use anything other than the system views like sys.columns in this case.

@pbarbin: Yes - there are many better approaches to this sort of problem - and carl.anderson mentions another worthwhile optimization in a later post (checking row counts). HashBytes, CHECKSUM, the TableDiff utility...it is a long list.

@steve-2 McNamee: Are you saying that the technique used in the article is faster than FOR XML PATH for fewer than 500 elements? Not nit-picking; I just want to be clear on it. For me, the greater problem is that using variables in this way lacks an ORDER guarantee. Incorrect results, even when produced fractionally more quickly, are rarely to be desired 😛

Good discussion :w00t:

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

stevemc Right there with Babe Points: 747 More actions · Answer 10

Paul asks, @steve-2 McNamee: Are you saying that the technique used in the article is faster than FOR XML PATH for fewer than 500 elements? Not nit-picking; I just want to be clear on it. For me, the greater problem is that using variables in this way lacks an ORDER guarantee. Incorrect results, even when produced fractionally more quickly, are rarely to be desired

No I'm not saying that; forgive my offhand remark. To clarify, about 6 months ago (before reading the great threads here about shredding), I was helping test one of my colleage's CSV-shredding CLR functions. I used a variant of this topic's technique to generate test data. When I lazily included the CSV generating code within the timing of the CLR, the results caused me to doubt the effectiveness of the CLR. I finally figured out that this technique, while handy and cool, can take a looong time to generate large CSV strings. The number 500 sticks in my head, but I don't have any evidence to back that up. For sure, when I got up to about 50,000 elements, the technique would not even return results before I got tired of waiting. That's when I found the XML technique to generate CSV strings, which did not seem to suffer from the same performance limitation.

From an email I sent at the time:

Last week, I thought I had found a performance limit in the CLR parsing function when parsing strings larger than about 30k elements. I was wrong about that. The offending code was the TSQL code I used the create the csv list that I then parsed using the function. With a @CSVList list of 515847 members, a select count(*) from the CLR function takes about a second.

Paul White SSC Guru Points: 150467 More actions · Answer 11

Cool stuff Steve - thanks for taking the time to write such a comprehensive answer - makes sense 🙂

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi

scott mcnitt Ten Centuries Points: 1028 More actions · Answer 12

MAV query is a very useful techique. I am sure it is something I am doing wrong (or is there a database setting I am missing) but I only get one column in my output when I should get 117.

declare @col_list varchar(4000), @tbl_name sysname

select @tbl_name = 'Master', @col_list = ''

--list all the columns first

select '[' + column_name + ']' as col

from information_schema.columns

where table_name = @tbl_name

--MAV query

select @col_list = @col_list + col + ', '

from (select '[' + column_name + ']' as col

from information_schema.columns

where table_name = @tbl_name) d

order by col

select @col_list

print @col_list

_________________________

(117 row(s) affected)

(1 row(s) affected)

[WorkPaidByAnother],

pbarbin Old Hand Points: 362 More actions · Answer 13

@PaulWhite said: Absoutely - and I was going to include a quote from BOL to make the point that system views are preferred over INFORMATION_SCHEMA...but I didn't want to pile on too much. I agree that it is rarely better to use anything other than the system views like sys.columns in this case.

Thanks for all the good information you post, but I have to chime in on this one. Personally, I use system views almost exclusively, but there is a good reason to use INFORMATION_SCHEMA views and that is for portability of code. They are the SQL-92 standard. And when have you ever been afraid to pile on too much? 😛

Paul White SSC Guru Points: 150467 More actions · Answer 14

pbarbin (1/27/2010)
Thanks for all the good information you post, but I have to chime in on this one. Personally, I use system views almost exclusively, but there is a good reason to use INFORMATION_SCHEMA views and that is for portability of code. They are the SQL-92 standard. And when have you ever been afraid to pile on too much? 😛

Ah...code portability. Pretty rare requirement in my experience, and there will likely be bigger challenges in porting to another database product than use of system views...but ok, it's a consideration I guess.

When have I ever been afraid of piling on too much? Just the once it seems :laugh:

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi