String manipulation, especially with large strings, has always been a performance issue on Microsoft Windows operating systems. I was recently trouble-shooting a performance issue, and while doing so I learned some very useful methods where using XML can significantly improve performance during large string manipulations. In all of my examples below, I’m using the Sales.SalesOrderHeader table (which has a little over 31,000 records) from the AdventureWorks database, which you can download and install from CodePlex.com.
1. Generation of an XML string.
I was creating a rather lengthy XML string, and waiting a frustratingly long amount of time to build the string. I was building the XML string by appending the XML tags with the data to the string row by row, like so:
-- build XML string manually
use AdventureWorks
GO
declare @MyXMLString varchar(max)
declare @StartTime datetime, @EndTime datetime
set @MyXMLString = ''
set @StartTime = CURRENT_TIMESTAMP
select @MyXMLString = @MyXMLString +
'<Row><AccountNumber >'+ AccountNumber +
'</AccountNumber ></Row>'
from Sales.SalesOrderHeader
set @MyXMLString = '<Rows>' + @MyXMLString + '</Rows>'
set @EndTime = CURRENT_TIMESTAMP
print @MyXMLString
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
-- prove string works
declare @MyXML XML
set @MyXML = @MyXMLString
select i.item.value('AccountNumber[1]','nvarchar(15)')
from @MyXML.nodes('/Rows/Row')AS i(item)
GO
The creation of this string takes 1,966,306 ms… or just over 32.75 minutes.
By using XML in SQL Server, you can dramatically improve on this time:
-- build XML string using SQL FOR XML
use AdventureWorks
GO
declare @StartTime datetime, @EndTime datetime
declare @MyXMLString XML
set @StartTime = CURRENT_TIMESTAMP
select @MyXMLString =
(select AccountNumber
from Sales.SalesOrderHeader
FOR XML RAW('Row'),ROOT('Rows'),ELEMENTS )
set @EndTime = CURRENT_TIMESTAMP
select @MyXMLString, @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
-- prove string works
select i.item.value('AccountNumber[1]','nvarchar(15)')
from @MyXMLString.nodes('/Rows/Row')AS i(item)
GO
The ROOT clause specifies to add the specified text as the top-level element (root) of the xml string. The RAW clause specifies to use the specified text as the element for each record. The ELEMENTS tag specifies to put each column as a separate xml sub-element. The output of this select statement is the exact same XML string as the previous example, and this takes only 260ms. Hmmm… over 32 minutes, or under 1/3 of a second? I know which one I’d prefer using!
2. What about creating a comma-delimited list?
This way adds a comma plus the column row by row to a string, and finally removes the beginning comma.
-- build CSV string using string manipulation
use AdventureWorks
GO
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
set @StartTime = CURRENT_TIMESTAMP
set @CSV = ''
select @CSV = @CSV + ',' + AccountNumber from Sales.SalesOrderHeader
set @CSV = substring(@CSV, 2, len(@CSV))
set @EndTime = CURRENT_TIMESTAMP
print @CSV
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
GO
This takes 216,393 ms… or just over 3.5 minutes.
Using XML:
-- build CSV string using SQL FOR XML
use AdventureWorks
GO
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
set @StartTime = CURRENT_TIMESTAMP
select @CSV = (select ',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV))
set @EndTime = CURRENT_TIMESTAMP
print @CSV
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
GO
By not using the ROOT clause, the top-level element (root) tag is not added. By specifying an empty string for the PATH clause, no wrapper element for the row is generated. By not using the ELEMENTS clause, sub-elements are not generated. What gets returned is just what is in the select clause – in this case the column prefixed with a comma. All that’s left is to remove the leading comma. This way takes 40 ms. Another remarkable difference!
3. And how about parsing a comma-delimited list? This usually involves a function that will strip a comma-delimited list into separate parts, putting the parts into a table that can then be used. These are adequate for small strings, but how do they compare against a large string? An example of such a function is (obtained off of the internet):
USE AdventureWorks
GO
if object_id('dbo.fnMVParam')is not null
DROP FUNCTION [dbo].[fnMVParam]
GO
CREATE FUNCTION [dbo].[fnMVParam](@repparam nvarchar(max), @delim char(1)= ',')
returns @values table (paramnvarchar(max))
as
begin
declare @chrind int
declare @piece nvarchar(max)
select @chrind = 1
while @chrind > 0
begin
select @chrind = charindex(@delim, @repparam)
if @chrind > 0
select @piece =ltrim(left(@repparam, @chrind - 1))
else
select @piece =ltrim(@repparam)
if @piece is not null
insert @values(param)values(@piece)
select @repparam = right(@repparam, len(@repparam)- @chrind)
iflen(@repparam)= 0 break
end
return
end
GO
When using this function:
use AdventureWorks
GO
-- create comma-delimited string with above XML method
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV)-1)
set @StartTime = CURRENT_TIMESTAMP
select * from dbo.fnMVParam(@CSV,',')
set @EndTime = CURRENT_TIMESTAMP
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
if object_id('dbo.fnMVParam')is not null
DROP FUNCTION [dbo].[fnMVParam]
GO
This method takes 180,580 ms… or just over 3 minutes. Using XML:
use AdventureWorks
GO
-- create comma-delimted string with above XML method
declare @CSV varchar(max), @StartTime datetime, @EndTime datetime
select @CSV = (select',' + AccountNumber from Sales.SalesOrderHeader FOR XML PATH('') )
set @CSV = substring(@CSV, 2, len(@CSV)-1)
-- convert the CSV string into a valid XML string
set @StartTime = CURRENT_TIMESTAMP
declare @MyXMLData XML
-- replace special XML characters that cause issues in SQL
set @CSV = replace(replace(@CSV,'&', '&'),'<', '<')
set @MyXMLData = '<Rows><Row><AccountNumber>'+
replace(@CSV,',','</AccountNumber></Row><Row><AccountNumber>')+
'</AccountNumber></Row></Rows>'
select x.item.value('AccountNumber[1]','nvarchar(15)')
from @MyXMLData.nodes('/Rows/Row')AS x(item)
set @EndTime = CURRENT_TIMESTAMP
select @StartTime, @EndTime, DateDiff(ms, @StartTime, @EndTime)
GO
What’s happening here is we have to replace any XML “special” characters with the code used to represent them. In my testing, only these two (ampersand and less-than) need to be replaced; the other special characters (single-quote, double-quote, and greater-than) do not cause any issues with SQL. We then need to replace all of the comma delimiters with the ending column/row element tags, and then start the next row/column element tags. Finally, the root, row, and element starting tags need to be added to the beginning and end of the string for the first and last values in the string. This method takes 1540 ms… or just over 1.5 seconds. Here we have yet another major improvement.
As you can see, performing string manipulations on a large string can be very time consuming. Furthermore, I sure that there are ways that are more efficient than what I’ve used above. However, many of the string manipulations can now be performed using XML in a fraction of the time.
In conclusion, learning how to work with XML on strings in SQL can be very beneficial to the performance of your systems. For more information about using XML, check out the MSDN SQL Server 2005 BOL article for the FOR XML clause of the SELECT statement.