Blog Post

Be Careful with String Concatenations in SQL Server with Big Strings.

,

 Now, we are going to get slightly esoteric here. I'd noticed that the performance of string concatenation tasks didn't increase in a linear fashion with increasing size of a string. For a short string, it was blindingly fast, but when you scale up, the performance gets disappointing. When you're handling big strings, it is time to change the algorithm. I felt it was time to do a rough test. The poor old development machine is wheeled out to run a test on a table with every word of Moby Dick

So we'll read in the entire text of Moby Dick, splice it into words and then see how long it takes to join them all together. It takes under a minute to slice it all up (169,683 words if you're curious).

Considering the ease of the operation of splicing Moby Dick back together again, I was expecting it to be quicker. It wasn't I was surprised that the regression wasn't linear; it was polynomial (the equation is shown in the graph. What this means is that the operation is relatively more expensive as the size of the string increases.

Graph of time taken against the number of string concatenations

Hmm. Time to put it to the test and create a graph. We try joining more and more of Moby Dick together to see if the operation increases linearly with the number of words we join. We start with just a hundred or so words and increase until we have the whole book, and draw a graph of the time taken (in milliseconds) against the number of words concatenated together.

So we construct a simple test to try it out. You've seen some of this code before! I did this with SQL Server 2005, on a very modest developmentsystem that I always us for writing time-critical code. It may me different for SQL Server 2008 with lots of memory.I've attached the text file of Moby Dick in case you'd like to experiment.


DECLARE @LotsOfText VARCHAR(MAX)
SELECT   @LotsOfText = BulkColumn
FROM     OPENROWSET(BULK 'C:\workbench\moby-dick.txt', SINGLE_BLOB) AS x  

/* read each word into a table (we make it a temporary table as we don't want to keep it for ever!*/
CREATE TABLE ##word
  
(
    
Word_ID INT IDENTITY(1, 1)
                
PRIMARY KEY,
    
item VARCHAR(80)
   )

INSERT   INTO ##word
        
(ITEM)
        
SELECT   item
        
FROM     wordchop(@LotsOfText)
---48 seconds (214243 words)

CREATE TABLE ##log
  
(
    
Log_ID INT IDENTITY(1, 1),
    
Event VARCHAR(30),
    
param INT,
    
InsertionDate DATETIME DEFAULT GETDATE()
   )
DECLARE @command VARCHAR(255),
  
@ActualCommand VARCHAR(255)
SELECT   @Command = '
DECLARE @STRING VARCHAR(MAX)
insert into ##log (event,param) Select ''concatenate || rows'',||
SET ROWCOUNT ||
sELECT @sTRING =COALESCE(@String,'''')+item from ##word'
DECLARE @ii INT

SELECT   @ii = 1000
WHILE @ii < 200000
  
BEGIN
      SELECT  
@ActualCommand = REPLACE(@command, '||', CONVERT(VARCHAR(8), @ii))
      
EXECUTE (@ActualCommand)
      
SELECT   @ii = @ii + 1000
  
END

SELECT   event,
      
[rows] = param,
      
[Time (Ms)] = DATEDIFF(ms, [InsertionDate],
                           (
SELECT [InsertionDate]
                              
FROM ##log f
                              
WHERE f . Log_ID = g . Log_ID + 1
                          
)
                       )
FROM     ##log g
WHERE    Log_ID < ( SELECT MAX (Log_ID) FROM ##log )


Stop press! Already, Barry is onto something very useful. I haven't worked out a way of putting a graph into a comment so I'll add this to the original post. I've no re-run the tests by doing a comparison between a simple concatenation and Barry's Rollover 2^k technique. Barry's trick wins hands-down, and is showing significant gains even in modest concatenation operations.

Second Test Run

 


Cor. This is fun. Just sneaked off and ran Peter's version, and his is the clear winner. Even the titanic act of joining together the entire text of Moby Dick took only half a second (406 Ms actually. Here is the combined chart, which shows the huge improvement possible in a single SQL Statement!

 The Third Test

 

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating