Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

A Case For Concatenation (Building Delimited Strings ) In T-SQL Expand / Collapse
Author
Message
Posted Tuesday, February 24, 2009 10:42 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, August 17, 2014 3:10 PM
Points: 2,787, Visits: 6,098
I have heard some criticism of doing concatenation (building delimited strings ) in T-SQL. Two reasons given for this criticism are (1) to protect the resources of the SQL server by farming the load of string manipulation out to the UI, and (2) to "do things where they make sense". Basically, other languages are better at string manipulation and other tasks, so why bother doing it in SQL? Just return the rows and let the UI handle it.

This makes obvious sense when building strings. SQL passes a FirstName|LastName|OverdueAmt as separate columns to a UI that can then build the string it wants, whether that string is

John Smith
Smith, John
Dear John
or
Dear Mr. Smith we currently show your account as having an overdue balance of $250.

In the last instance especially, it is foolishly wasteful to add all the constant information at the server then pass it over the network to the application.

However, a common requirement is to build a delimited string from a single column value repeated for multiple rows. This is something easily accomplished in T-SQL with very few lines of code, so I consider it to be a draw with respect to string manipulation capability. That leaves the issue of workload on the server.

The code at the bottom generates a table of unique first names, last names and department numbers. The schema for the tables containing the first names and last names are provided. I populated them by simply typing in 50 or so names off the top of my head, in some place duplicating last names such as Smith or Jones. But you could just as easily populate them with random strings. Department number is a random number between 1 and 20.

The resulting table (#temp2) will be populated with approximately 22,000 unique name/department combinations. The rest of the code compares SQL concatenation of this table with simply returning the values. On my laptop, the concatenation actually runs faster than the simple display of the data. Furthermore, the result set of the concatenation is approximately 1200 rows, instead of over 22,0000. This is, in effect, a compression of the data, because the Firstname/Lastname columns aren't being repeated an additional 20 times. When output was directed to text files, the concatenated version was around 110k in size and the simple display was over a meg.

That is nine times the data which has to be buffered and passed to the UI code. If temporary or permanent holding files are the target of this output, there would also be additional logging involved. To me it seems that concatenation at the SQL server may ultimately be reducing the load on the server. What am I missing here?


------------------------------
-- concatenation test
------------------------------
set nocount on;
--- pseudo random number generation for numbers between 1 and @X
SELECT TOP 100000
IDENTITY(INT,1,1) AS RowID,
ABS(CHECKSUM(NEWID())) % 50 + 1 AS FKey,
ABS(CHECKSUM(NEWID())) % 50 + 1 AS LKey,
ABS(CHECKSUM(NEWID())) % 20 + 1 AS Dept
INTO #temp
FROM Tally;

select distinct firstName,lastname,dept
into #temp2
from #temp
join Lastnames on lastNameID = Lkey
join FirstNames on firstNameID = Fkey

select count(*) as [#temp2 rows] from #temp2

declare @timer datetime


set @timer = getdate()
print 'Display Only'
set statistics time on;

select * from #temp2
ORDER BY LastName,firstName,dept

set statistics time off;
print datediff(ms,@timer,getdate())





set @timer = getdate()
print 'Concatentation'
set statistics time on;

select firstName,LastName
,stuff(( SELECT ',' + cast(dept as varchar(2))
FROM #temp2 t2
WHERE t2.LastName = t1.LastName and t2.firstName = t1.firstName -- must match GROUP BY below
ORDER BY dept
FOR XML PATH('')
),1,1,'') as [Departments]
from #temp2 t1
GROUP BY LastName,firstName -- without GROUP BY multiple rows are returned
ORDER BY LastName,firstName

set statistics time off;
print datediff(ms,@timer,getdate())







drop table #temp
drop table #temp2


__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #663702
Posted Tuesday, February 24, 2009 11:13 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
As with many things, "it depends". In the situation you outlined, manipulating the values in SQL makes sense.

What you might try, if you can, is duplicating the functionality in a CLR proc and see if it's faster at it than the T-SQL version.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #663712
Posted Tuesday, February 24, 2009 12:30 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, August 17, 2014 3:10 PM
Points: 2,787, Visits: 6,098
I'm afraid I can't, but I will listen respectfullly to anyone who can, if they'd care to report their results. However, I see that as a different issue from doing it at the database level instead of passing it to a UI.

I can't argue with "It depends." The difference in concatenating and simple passing of the raw data is a function of the average number of rows that need to be concatenated, the average size of the data within the column being concatenated, and the combined size of the columns that would be duplicated. With only two or three average rows being concatenated, the percentage of "compression" would obviously be less. The same would hold true if the values being concatenated were longer strings.

Please understand, I'm not arguing that concatenation should always be done at the server, I'm just saying there seem to be instances where it would be the better solution, and more instances where it really doesn't matter.

Thanks for the feedback, G.


__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #663768
Posted Tuesday, February 24, 2009 1:54 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 11:11 AM
Points: 10,282, Visits: 13,264
I'm a fan of letting the UI or business layer handle formatting the data and let SQL Server return the data. I know I can do a lot of formatting in T-SQL and I have at times, but that's not what it is best at so I try not to do it there. I'm not a hard-liner on either side, I'm more concerned with making it work and so anyone can read it and debug it. In the example you present I would consider the SQL you have doing the concatenation advanced SQL and most people would not understand it, while doing the concatenation in VB or C# would be a simple loop that any novice programmer could read and debug.

Here are a couple of blog posts. The first one on the do it the presentation layer side and the second on the do it in SQL sometimes side.

http://sqlblog.com/blogs/jonathan_kehayias/archive/2009/02/19/sql-tip-keep-your-presentation-logic-in-your-presentation-layer.aspx

http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/02/20/should-i-always-keep-my-presentation-logic-in-my-presentation-layer.aspx




Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #663815
Posted Tuesday, February 24, 2009 2:48 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, August 17, 2014 3:10 PM
Points: 2,787, Visits: 6,098
Jack, I can see that it would be considered advanced. I never heard about using FOR XML this way until the last couple of months. But having seen it once, the technique is elegantly simple. It is much more intuitive than using a tally table to parse strings.

I deliberately sidestepped the issue of other peoples' skills as a consideration in "doing it where it makes sense". Skill levels vary from organization to organization, and for some people, it would take the hours of time to get it right in the UI. You write code on both sides, so you can do it wherever it is easiest for you. So again the answer becomes "it depends".

Again, this is just me trying to understand what other considerations should influence the choice. When people I respect say not to do something, it's wise to listen; but it's also important to understand the "why" behind the general rule. The first post contains a number of examples where indisputably the UI should handle string manipulation. But in the case of building delimited strings out of column values for multiple rows, SQL still looks like the better choice.

Thanks for the input.


__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #663844
Posted Tuesday, February 24, 2009 3:05 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 11:11 AM
Points: 10,282, Visits: 13,264
Bob,

Always willing to share my opinion, even when it probably stinks.

See, I think that it is necessary to consider others abilities whenever coding something. I want the person that comes behind me, assuming I will eventually move on, to be able to understand my code. I've been the guy coming behind too many times. Certainly if you accurately document your code then it isn't as much of an issue, but I still think it needs to be considered when choosing a solution.




Jack Corbett

Applications Developer

Don't let the good be the enemy of the best. -- Paul Fleming

Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
How to Post Performance Problems
Crosstabs and Pivots or How to turn rows into columns Part 1
Crosstabs and Pivots or How to turn rows into columns Part 2
Post #663854
Posted Tuesday, February 24, 2009 3:10 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, August 17, 2014 3:10 PM
Points: 2,787, Visits: 6,098
Ah... but you are denying them the opportunity to broaden their horizons and expand their minds by saying "What the...? How did he DO that???" :D



__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #663864
Posted Tuesday, February 24, 2009 3:25 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Wednesday, October 8, 2014 4:15 AM
Points: 20,581, Visits: 9,619
Bob Hovious (2/24/2009)
Ah... but you are denying them the opportunity to broaden their horizons and expand their minds by saying "What the...? How did he DO that???" :D



Another it depends... what if it takes 5 hours for him to figure it out... while it would take you 2 minutes to document it or code it "better".

That's a big value for your employer too...
Post #663874
Posted Tuesday, February 24, 2009 4:12 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Sunday, August 17, 2014 3:10 PM
Points: 2,787, Visits: 6,098
(Sigh... nobody has a sense of humor anymore.)

Guys, I've learned to document code out of self-preservation... because I may be the one who has to revisit the code months or years later later. When someone tells me "You ought to be able to look at the code and see what it's doing." my response it always "I can see what it's doing, what I need to know is what it is TRYING to do."

My comment wasn't aimed at the comment about documentation, but at the suggestion that we should code to the lowest common denominator. When better solutions come around, we should pass them along, not avoid them because the next guy might not be familiar with them.

Thanks again for engaging here. I'm very happy this discussion isn't corrupting the Endless Thread.


__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? -- Stephen Stills
Post #663901
Posted Tuesday, February 24, 2009 11:06 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 1:53 PM
Points: 35,366, Visits: 31,905
The problem with even some small scale concatenation is... people abuse it and don't know what it actually does to the server. Sure, "It Depends"... but few take the time to figure out "What Depends"... ;)

Test data...
DROP TABLE JBMTest
GO
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 1,000,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
-- Jeff Moden
SELECT TOP 1000000
RowNum = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2

--===== A table is not properly formed unless a Primary Key has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (RowNum)

Test code...
 SET STATISTICS TIME ON
SELECT t1.SomeInt,
STUFF((SELECT ',' + t2.SomeLetters2 FROM dbo.JBMTest t2 WHERE t1.SomeInt = t2.SomeInt FOR XML PATH('')),1,1,'') AS Nodes
FROM dbo.JBMTest t1
GROUP BY t1.SomeInt

SELECT SomeInt,SomeLetters2
FROM dbo.JBMTest
SET STATISTICS TIME OFF

Test results...
(50000 row(s) affected)

SQL Server Execution Times:
CPU time = 19281 ms, elapsed time = 29566 ms.

(1000000 row(s) affected)

SQL Server Execution Times:
CPU time = 703 ms, elapsed time = 25840 ms.

The concatenation used 27.42 times more cpu and, if you consider the display as the I/O, the concatenation was longer in duration, as well.

Even though the concatenation sent about half the number of bytes (because the concatenated operands were so short), which do you suppose is tougher on the server?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #664022
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse