Now, we are going to get slightly esoteric here. I'd noticed that the performance of string concatenation tasks didn't increase in a linear fashion with increasing size of a string. For a short string, it was blindingly fast, but when you scale up, the performance gets disappointing. When you're handling big strings, it is time to change the algorithm. I felt it was time to do a rough test. The poor old development machine is wheeled out to run a test on a table with every word of Moby Dick
So we'll read in the entire text of Moby Dick, splice it into words and then see how long it takes to join them all together. It takes under a minute to slice it all up (169,683 words if you're curious).
Considering the ease of the operation of splicing Moby Dick back together again, I was expecting it to be quicker. It wasn't I was surprised that the regression wasn't linear; it was polynomial (the equation is shown in the graph. What this means is that the operation is relatively more expensive as the size of the string increases.
Hmm. Time to put it to the test and create a graph. We try joining more and more of Moby Dick together to see if the operation increases linearly with the number of words we join. We start with just a hundred or so words and increase until we have the whole book, and draw a graph of the time taken (in milliseconds) against the number of words concatenated together.
So we construct a simple test to try it out. You've seen some of this code before! I did this with SQL Server 2005, on a very modest developmentsystem that I always us for writing time-critical code. It may me different for SQL Server 2008 with lots of memory.I've attached the text file of Moby Dick in case you'd like to experiment.
DECLARE @LotsOfText VARCHAR(MAX)
SELECT @LotsOfText = BulkColumn
FROM OPENROWSET(BULK 'C:\workbench\moby-dick.txt', SINGLE_BLOB) AS x
/* read each word into a table (we make it a temporary table as we don't want to keep it for ever!*/
CREATE TABLE ##word
Word_ID INT IDENTITY(1, 1)
INSERT INTO ##word
---48 seconds (214243 words)
CREATE TABLE ##log
Log_ID INT IDENTITY(1, 1),
InsertionDate DATETIME DEFAULT GETDATE()
DECLARE @command VARCHAR(255),
SELECT @Command = '
DECLARE @STRING VARCHAR(MAX)
insert into ##log (event,param) Select ''concatenate || rows'',||
SET ROWCOUNT ||
sELECT @sTRING =COALESCE(@String,'''')+item from ##word'
DECLARE @ii INT
SELECT @ii = 1000
WHILE @ii < 200000
SELECT @ActualCommand = REPLACE(@command, '||', CONVERT(VARCHAR(8), @ii))
SELECT @ii = @ii + 1000
[rows] = param,
[Time (Ms)] = DATEDIFF(ms, [InsertionDate],
FROM ##log f
WHERE f . Log_ID = g . Log_ID + 1
FROM ##log g
WHERE Log_ID < ( SELECT MAX (Log_ID) FROM ##log )
Stop press! Already, Barry is onto something very useful. I haven't worked out a way of putting a graph into a comment so I'll add this to the original post. I've no re-run the tests by doing a comparison between a simple concatenation and Barry's Rollover 2^k technique. Barry's trick wins hands-down, and is showing significant gains even in modest concatenation operations.
Cor. This is fun. Just sneaked off and ran Peter's version, and his is the clear winner. Even the titanic act of joining together the entire text of Moby Dick took only half a second (406 Ms actually. Here is the combined chart, which shows the huge improvement possible in a single SQL Statement!