RE: Convert CSV values in three columns to rows

SSC Guru

Points: 1003863

September 19, 2010 at 12:57 pm

Oleg Netchaev (9/18/2010)
Jeff,
So far, I can see that I use the same setup as far as the tally creation is concerned. The collation is of course the same as well. You are correct in not using xml3, it is designed to fail due to the cross apply inside of it. This is what prompted me to modify my answer on ask after I saw Kevan's. My original answer included the same dreaded cross apply, but once I removed it, it became faster than Kevan's tally for sizeable strings. I still deleted the answer from ask altogether cause I respect your opinion (though cannot agree with it yet). I will try to play with settings and also check whether it is possible to tweak the xml a bit. One thing I saw for sure, I and hope that you will agree that with huge strings, xml execution times does not increase much, but tally split sure does.
I think that the most relevant test we can have will be with the string size somewhere in the middle, say 4k per record meaning about 400 items to split. More than 800 makes the string greater than 8k in size.
I would like to point out that while you do mention apples to apples comparison, the xml function you wrote does use the varchar(max). I will leave it as is though, because xml-ifying the input bloats it somewhat, so let varchar(max) stay in xml function.
Oleg

Hi Oleg,

I appreciate the respect and I do wish you'd put your good post back up on Ask.

I definitely agree that the Tally table stumbles pretty badly on things above 8k and I'm setting up to test that.

On the apples-to-apples thing, I left VARCHAR(MAX) in the XML code because in order to get anything close to 8k of data, you have to use VARCHAR(MAX) to account for the addition of the 7 character tag information at each delimiter. Just to be complete, though, I'll try it with just a VARCHAR(8000) at the same 100 elements I've been testing with, so far.

I've got some folks participating in the 100 element test right now. If it works out ok, I'll try to figure out a way to do a bit more automated and extensive testing across a wider range. I didn't think the Statistics reporting was going to work out as well as it is currently working out and didn't want to bog peoples machines down so much.

You WILL be please to know that it's YOUR method in XML-1 in that testing and even though it's a mlTVF, it blows the other XML methods away for performance. Under 8k, the Tally table still beats up on it pretty good. Like I said, I'm setting up for >8k testing... it just takes some time to do it all so people don't have to mess with things when they test it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)