SQLServerCentral Article

Using XML to Enhance the Performance of String Manipulations

,

String manipulation, especially with large strings, has always been a performance issue on Microsoft Windows operating systems. I was recently trouble-shooting a performance issue, and while doing so I learned some very useful methods where using XML can significantly improve performance during large string manipulations. In all of my examples below, I’m using the Sales.SalesOrderHeader table (which has a little over 31,000 records) from the AdventureWorks database, which you can download and install from CodePlex.com.

1. Generation of an XML string.

I was creating a rather lengthy XML string, and waiting a frustratingly long amount of time to build the string. I was building the XML string by appending the XML tags with the data to the string row by row, like so:

 -- build XML string manually
 use AdventureWorks
GO
 declare @MyXMLString varchar(max)
 declare @StartTime datetime, @EndTime datetime
 set @MyXMLString = ''
 set @StartTime = CURRENT_TIMESTAMP
 select @MyXMLString = @MyXMLString +
                       '<Row><AccountNumber >'+ AccountNumber +
                       '</AccountNumber ></Row>'
   from Sales.SalesOrderHeader
 set @MyXMLString = '<Rows>' + @MyXMLString + '</Rows>'
 set @EndTime = CURRENT_TIMESTAMP
 print @MyXMLString
 select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
 -- prove string works
 declare @MyXML XML
 set @MyXML = @MyXMLString
 select i.item.value('AccountNumber[1]','nvarchar(15)')
   from @MyXML.nodes('/Rows/Row')AS i(item)
GO

The creation of this string takes 1,966,306 ms… or just over 32.75 minutes.

By using XML in SQL Server, you can dramatically improve on this time:

 -- build XML string using SQL FOR XML
 use AdventureWorks
GO
 declare @StartTime datetime, @EndTime datetime
 declare @MyXMLString XML
 set @StartTime = CURRENT_TIMESTAMP
 select @MyXMLString =
       (select AccountNumber
         from Sales.SalesOrderHeader
        FOR XML RAW('Row'),ROOT('Rows'),ELEMENTS )
 set @EndTime = CURRENT_TIMESTAMP
 select @MyXMLString, @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
 -- prove string works
 select i.item.value('AccountNumber[1]','nvarchar(15)')
   from @MyXMLString.nodes('/Rows/Row')AS i(item)
GO

The ROOT clause specifies to add the specified text as the top-level element (root) of the xml string. The RAW clause specifies to use the specified text as the element for each record. The ELEMENTS tag specifies to put each column as a separate xml sub-element. The output of this select statement is the exact same XML string as the previous example, and this takes only 260ms. Hmmm… over 32 minutes, or under 1/3 of a second? I know which one I’d prefer using!

2. What about creating a comma-delimited list?

This way adds a comma plus the column row by row to a string, and finally removes the beginning comma.

-- build CSV string using string manipulation
use AdventureWorks
GO
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
set @StartTime = CURRENT_TIMESTAMP
set @CSV = ''
select @CSV = @CSV + ',' + AccountNumber from Sales.SalesOrderHeader
set @CSV = substring(@CSV, 2, len(@CSV))
set @EndTime = CURRENT_TIMESTAMP
print @CSV
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
GO
This takes 216,393 ms… or just over 3.5 minutes.
Using XML:
-- build CSV string using SQL FOR XML
use AdventureWorks
GO
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
set @StartTime = CURRENT_TIMESTAMP
select @CSV = (select ',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV))
set @EndTime = CURRENT_TIMESTAMP
print @CSV
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
GO

By not using the ROOT clause, the top-level element (root) tag is not added. By specifying an empty string for the PATH clause, no wrapper element for the row is generated. By not using the ELEMENTS clause, sub-elements are not generated. What gets returned is just what is in the select clause – in this case the column prefixed with a comma. All that’s left is to remove the leading comma. This way takes 40 ms. Another remarkable difference!

3. And how about parsing a comma-delimited list? This usually involves a function that will strip a comma-delimited list into separate parts, putting the parts into a table that can then be used. These are adequate for small strings, but how do they compare against a large string? An example of such a function is (obtained off of the internet):

USE AdventureWorks
GO
if object_id('dbo.fnMVParam')is not null
      DROP FUNCTION [dbo].[fnMVParam]
GO
CREATE FUNCTION [dbo].[fnMVParam](@repparam nvarchar(max), @delim char(1)= ',')
  returns @values table (paramnvarchar(max))
as
begin
    declare @chrind int
    declare @piece nvarchar(max)
    select @chrind = 1
    while @chrind > 0
       begin
          select @chrind = charindex(@delim, @repparam)
          if @chrind > 0
             select @piece =ltrim(left(@repparam, @chrind - 1))
          else
             select @piece =ltrim(@repparam)
          if @piece is not null
            insert @values(param)values(@piece)
          select @repparam  = right(@repparam, len(@repparam)- @chrind)
          iflen(@repparam)= 0 break
       end
    return
end
GO
When using this function:
use AdventureWorks
GO
-- create comma-delimited string with above XML method
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV)-1)
set @StartTime = CURRENT_TIMESTAMP
select * from dbo.fnMVParam(@CSV,',')
set @EndTime = CURRENT_TIMESTAMP
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
if object_id('dbo.fnMVParam')is not null
      DROP FUNCTION [dbo].[fnMVParam]
GO

This method takes 180,580 ms… or just over 3 minutes. Using XML:

use AdventureWorks
GO
-- create comma-delimted string with above XML method
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV)-1)
-- convert the CSV string into a valid XML string
set @StartTime = CURRENT_TIMESTAMP
declare @MyXMLData XML
-- replace special XML characters that cause issues in SQL
set @CSV = replace(replace(@CSV,'&', '&amp;'),'<', '&lt;')
set @MyXMLData = '<Rows><Row><AccountNumber>'+
      replace(@CSV,',','</AccountNumber></Row><Row><AccountNumber>')+
      '</AccountNumber></Row></Rows>'
select x.item.value('AccountNumber[1]','nvarchar(15)')
  from @MyXMLData.nodes('/Rows/Row')AS x(item)
set @EndTime = CURRENT_TIMESTAMP
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
GO

What’s happening here is we have to replace any XML “special” characters with the code used to represent them. In my testing, only these two (ampersand and less-than) need to be replaced; the other special characters (single-quote, double-quote, and greater-than) do not cause any issues with SQL. We then need to replace all of the comma delimiters with the ending column/row element tags, and then start the next row/column element tags. Finally, the root, row, and element starting tags need to be added to the beginning and end of the string for the first and last values in the string. This method takes 1540 ms… or just over 1.5 seconds. Here we have yet another major improvement.

As you can see, performing string manipulations on a large string can be very time consuming. Furthermore, I sure that there are ways that are more efficient than what I’ve used above. However, many of the string manipulations can now be performed using XML in a fraction of the time.

In conclusion, learning how to work with XML on strings in SQL can be very beneficial to the performance of your systems. For more information about using XML, check out the MSDN SQL Server 2005 BOL article for the FOR XML clause of the SELECT statement.

Rate

4.74 (39)

You rated this post out of 5. Change rating

Share

Share

Rate

4.74 (39)

You rated this post out of 5. Change rating