Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Viewing 15 posts - 286 through 300 (of 990 total)

You must be logged in to reply to this topic. Login to reply

peter-757102 SSCertifiable Points: 6877 More actions · Answer 1

Jeff Moden (3/15/2012)
peter-757102 (3/15/2012)
Jeff, did you receive any news regarding the performance characteristics of the code (or variations) on SQL Server 2012?
And on the extra bright side, in a few months I will be working in a .NET driven development environment, then it is time for me to do some experiments with use of that in SQL Server :).
Haven't tried it there yet but I don't see why any of this would change.
So far as the .Net driven development environment goes, take a look at the CLR splitter that Paul White wrote for the comparision testing in this article.

It certainly could be quite different if one of the following internals have been modified in SQL Server 2012 versus 2008:

* string handling

* memory management

* query optimizer changes (CTE/operator stuff)

Also the difference in performance for varchar(max) and varchar types might be gone as there is no good reason for large difference there from a pure technical perspective. As for .Net coding, I will certainly look at Paul White's splitter. To begin with as a good source to learn from. I haven't done anything .Net much, nor recent.

By the way, did I miss your long string splitter version somehow, or was that work never finished?

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 2

Actually, there is a good reason for the MAX datatypes to have a performance problem with such splitters. They're simply handled differenently even when they're "in row" and because of that, they don't cotton to be joined to.

I've not gotten around to writting an article for the "long splitter" so, no, you didn't miss anything.

And I suspect that no matter what happens to the internals, the cteTally method in the article will continue to be faster than using the other types of splitters simply because all the methods would be affected by the things you suggest could change. Of course, I could be wrong, as well. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

peter-757102 SSCertifiable Points: 6877 More actions · Answer 3

I am not debating whether it will be the faster method or not, it's simple well thought out contruction will make sure it is. But it can potentially have much better scaling to larger strings if certain internals have been improved.

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 4

That would certainly be nice.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ErikEckhardt SSC Journeyman Points: 80 More actions · Answer 5

I am running your splitter tests on a SQL Server running in a VM. It's taking forever!

I let it go for 17 hours earlier this week, and finally tried canceling the query. But it wouldn't stop. So I killed the process. It was in KILLED/ROLLBACK state for 2 more days and furthermore blocking itself (!). I had to restart the server.

I tried again, starting the splitter tests 5 hours ago. What do you know--it's blocking itself again:

spid 56 blocked by spid 56, waittime 17840231, waittype LCK_M_X, waitresource KEY: 2:562949956108288, cpu 20858, physical_io 1585, open tran 2, status suspended, cmd SELECT INTO, DBCC INPUTBUFFER reveals last statement is the one starting

SELECT Info = 'Please make sure you''re in the grid mode when running these tests.'

Any thoughts?

Given the crazy waittime, and low cpu and io, it seems it got blocked really early and then sat there forever. I bet if I kill it nothing will happen, and if I wait till tomorrow nothing will happen. Sigh.

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 6

Apologies... I have no clue what the problem could be. We have a VM environment at work and it screams.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mark hutchinson SSC Eights! Points: 878 More actions · Answer 7

kinda like those Oldsmobile ads -- "this isn't your father's testing environment."

maybe best stated by Obi Wan Kanobi, "These aren't the test results you're looking for."

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 8

These types of problems with VM are typically are not code related. A lot of folks setup VM to move resources to and from SQLServer servers and, when such moves occur, some really bad things can happen. I don't know if that's Erik's case but thought it was worth mentioning.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ErikEckhardt SSC Journeyman Points: 80 More actions · Answer 9

Second try: failed. 3 days and 20-some-odd hours later, exact same results as before, only with wait time 334,605,998. 🙂

I think I will not be running this again on my VM SQL instance. That's okay, there are plenty other places I can try it...

The entire SQL Server could not be used for any other purpose for 2 days. I have something logging to a table every minute and it could not operate.

Gatekeeper SSCommitted Points: 1633 More actions · Answer 10

ErikEckhardt (3/19/2012)
Second try: failed. 3 days and 20-some-odd hours later, exact same results as before, only with wait time 334,605,998. 🙂
I think I will not be running this again on my VM SQL instance. That's okay, there are plenty other places I can try it...
The entire SQL Server could not be used for any other purpose for 2 days. I have something logging to a table every minute and it could not operate.

That's really odd. No problems here with the script. Were you having high CPU or I/O?

/* Anything is possible but is it worth it? */

Robin Sasson Hall of Fame Points: 3359 More actions · Answer 11

Great article Jeff. I used the splitter today to break apart filenames - amazing performance for our test runs. Can't wait to let it loose across the live data on Monday - 120,000 filenames. Thanks!

:exclamation: "Be brave. Take risks. Nothing can substitute experience." :exclamation:

Jeff Moden SSC Guru Points: 1003867 More actions · Answer 12

Very cool. Thanks for the success story, Robin. Glad it helped.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Usman Butt Hall of Fame Points: 3186 More actions · Answer 13

Hi,

I have used and referred the same splitter many times but need this to be implemented in a real time environment recently. So, I had to do some more research on it. I started with some minor improvements and ask for suggestions on the sub-site. Many thanks to Oleg, who provided another pointer with a very fruitful discussion and with an interesting 1133 elements, 10000 rows performance test (I am not going into the details and performance comparisons made). I tweaked the physical structure of the tally table a little bit according to the requirement by adding a pre-calculated column N1 which would have the value N+1 but there was only a minor gain as it reduces some IO. But the most important change was conversion of tally table’s column’s data-type i.e. from BIGINT to INT. I have come up with 2 versions so far.

Now, most importantly the performance test. I followed the article’s performance test and I saw major improvement and it beats all the versions. The margin was more significant when the number of elements increases or when the size of the string was bigger. When I used this version of the splitter, for 1133 elements & 10000 rows, the test completed in approximately 19 seconds, which seems very promising as these are still iTVFs.

USE [tempdb]

GO

SET NOCOUNT ON;

GO

IF OBJECT_ID('dbo.CsvTest') IS NOT NULL

DROP TABLE CsvTest

GO

DECLARE @NumberOfElements INT,

@NumberOfRows INT

/*======== PARAMETER VALUES ==============================*/

SELECT @NumberOfElements= 1133

, @NumberOfRows= 10000

/*========================================================*/

SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table

ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,

(

SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma

( --=== This builds CSV row with a leading comma

SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row

','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10))

FROM sys.All_Columns t3 --Classic cross join pseudo-cursor

CROSS JOIN sys.All_Columns t4 --can produce row sets up 121 million.

WHERE t1.Object_ID <> t3.Object_ID --Without this line, all rows would be the same

FOR XML PATH('')

)

,1,1,'') AS VARCHAR(8000))

) AS Csv

INTO CsvTest

FROM sys.All_Columns t1 --Classic cross join pseudo-cursor

CROSS JOIN sys.All_Columns t2 --can produce row sets up 16 million rows

GO

ALTER TABLE CsvTest

ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100;

GO

IF OBJECT_ID('dbo.Tally','U') IS NOT NULL

DROP TABLE dbo.Tally;

--===== Create and populate the Tally table on the fly.

SELECT TOP 11000

IDENTITY(INT,1,1) AS N, --PREVIOUSLY WAS BIGINT

-- This ISNULL function makes the column NOT NULL

ISNULL(ISNULL(CAST(ROW_NUMBER() OVER ( ORDER BY (SELECT NULL)) AS INT),0)+ 1, 0) N1

INTO dbo.Tally

FROM sys.all_columns ac1

CROSS JOIN sys.all_columns ac2

;

GO

--===== Add a CLUSTERED Primary Key to maximize performance

ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N

PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

;

--===== Update Stats

UPDATE STATISTICS dbo.Tally WITH FULLSCAN

;

GO

--=====================================================================================================================

-- Conditionally drop and recreate the Split8KTally_With_Column_N1 splitter function.

--=====================================================================================================================

IF OBJECT_ID(N'dbo.Split8KTally_With_Column_N1', N'IF') IS NOT NULL

DROP FUNCTION dbo.Split8KTally_With_Column_N1;

GO

--===== Tally Table (Split8KTally_With_Column_N1 ITVF) ===============================================================================

/*===== THIS FUNCTION USES COLUMN "N1" OF TALLY TABLE HAVING PRE-CALCULATED VALUES FOR N+1 ===========*/

CREATE FUNCTION dbo.Split8KTally_With_Column_N1 (

@pString VARCHAR(8000), @pDelimiter VARCHAR(1)

)

RETURNS TABLE

AS

RETURN

/*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/

SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER

, ItemValue = SUBSTRING(@pString, 1,

ISNULL(NULLIF(CHARINDEX(@pDelimiter,

@pString COLLATE Latin1_General_BIN), 0)

- 1, 8000 ))

UNION ALL -- CONCATENATE THE OUTPUTS

/*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/

SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] )

, ItemValue = SUBSTRING(@pString, [N1],

ISNULL(NULLIF(CHARINDEX(@pDelimiter ,

@pString COLLATE Latin1_General_BIN, [N1]), 0)

- [N1], 8000) )

FROM [dbo].[Tally] WITH ( NOLOCK ) --NOLOCK HINT IS NOT NECESSARY

WHERE [N] BETWEEN 1 AND LEN(@pString)

AND SUBSTRING(@pString, [N], 1) = @pDelimiter

GO

--=====================================================================================================================

-- Conditionally drop and recreate the Split8KTally_With_N_PLUS_1 splitter function.

--=====================================================================================================================

IF OBJECT_ID('dbo.Split8KTally_With_N_PLUS_1','IF') IS NOT NULL

DROP FUNCTION [dbo].Split8KTally_With_N_PLUS_1

GO

--===== Tally Table (Split8KTally_With_N_PLUS_1 ITVF) ===============================================================================

/*===== THIS FUNCTION USES RUNTIME CALCULATION "N + 1" INSTEAD OF USING THE PRE-CALCULATED COLUMN "N1" OF THE TALLY TABLE===========*/

CREATE FUNCTION dbo.Split8KTally_With_N_PLUS_1 (

@pString VARCHAR(8000), @pDelimiter VARCHAR(1)

)

RETURNS TABLE

AS

RETURN

/*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/

SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER

, ItemValue = SUBSTRING(@pString, 1,

ISNULL(NULLIF(CHARINDEX(@pDelimiter,

@pString COLLATE Latin1_General_BIN), 0)

- 1, 8000 ))

UNION ALL -- CONCATENATE THE OUTPUTS

/*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/

SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] )

, ItemValue = SUBSTRING(@pString, (N+1),

ISNULL(NULLIF(CHARINDEX(@pDelimiter ,

@pString COLLATE Latin1_General_BIN, (N+1)), 0)

- (N+1), 8000) )

FROM [dbo].[Tally] WITH ( NOLOCK )--NOLOCK HINT IS NOT NECESSARY

WHERE [N] BETWEEN 1 AND LEN(@pString)

AND SUBSTRING(@pString, [N], 1) = @pDelimiter

GO

--Tally Test

PRINT '/*====== dbo.Split8KTally_With_N_PLUS_1 =================*/'

DBCC FREEPROCCACHE WITH NO_INFOMSGS

SET STATISTICS TIME ON

DECLARE @ItemNumber BIGINT

, @Item VARCHAR(8000) ;

SELECT @ItemNumber = V.ItemNumber

, @Item = V.ItemValue

FROM dbo.CsvTest D

CROSS APPLY dbo.Split8KTally_With_N_PLUS_1(D.Csv, ',') V

SET STATISTICS TIME OFF

PRINT '/*========================================================*/'

PRINT CHAR(10) + CHAR(13)

GO

PRINT '/*====== dbo.Split8KTally_With_Column_N1 =================*/'

DBCC FREEPROCCACHE WITH NO_INFOMSGS

SET STATISTICS TIME ON

DECLARE @ItemNumber BIGINT

, @Item VARCHAR(8000) ;

SELECT @ItemNumber = V.ItemNumber

, @Item = V.ItemValue

FROM dbo.CsvTest D

CROSS APPLY dbo.Split8KTally_With_Column_N1(D.Csv, ',') V

SET STATISTICS TIME OFF

PRINT '/*========================================================*/'

PRINT CHAR(10) + CHAR(13)

GO

Last but not least, thanks to all who are contributing voluntarily to help the community, however, cannot thanks enough to one of my favorite mentors Jeff Moden.

P.S. The naming conventions are not according to the standards as the purpose is to convey the core of the function 🙂 Moreover, we all know everyone’s mileage may differ.

Usman Butt Hall of Fame Points: 3186 More actions · Answer 14

EDIT:

Testing for VARCHAR(MAX) version was being done on SQL 2005 only.

/EDIT

Since, in the future, we may need the VARCHAR(MAX) version as well, I did some testing on the other solutions as tally table solution does not behave well with wider strings. Two of them performed really well

1.Inline Recursive CTE solution (Split_RCTE, Tweaked a little bit for better performance)

2.TVF XML solution by Oleg (Split_XML_Solution_By_Oleg)

But surprisingly, RCTE was winner on some occasions with good margin as compared to losing. If I do the same 10000 rows, 1133 elements with 125-150 characters width, RCTE beat the XML solution marginally. But if the numbers of elements are decreased to 500, the RCTE was almost twice as fast as XML. But I would not count out any of the solutions, as they could fit according the requirement and environment.

For RCTE solution, I must say it is more resource intensive. The index spool and SORT operators in the execution plan clearly indicates that it would hit the memory, tempdb and processors hard. It also reflects that with more CPUs, memory and capacity planned tempdb, the solution is viable to a better performance as compared to other solutions. Having said that, I still may tilt towards the XML solution.

To me, why the RCTE performs much better than the tally table solution (both used the Charindex and Substring), is because the tally table solution compare each character with the delimiter, and due to the Out of Row phenomenon, this does not scale well. Whereas, RCTE solution does it for only the required number of times. Moreover, till VARCHAR(8000), the SUBSTRING(@pString, N,1) = @pDelimiter is dealt as a predicate in addition to the Seek predicate. But for VARCHAR(MAX), this is divided into two steps i.e. A Seek predicate followed by a Filter predicate which decreases the performance quite a bit.

BTW, this is what my tally table VARCHAR(max) version is

SET QUOTED_IDENTIFIER ON

SET ANSI_NULLS ON

GO

--Create Split Tally Function VARCHAR(MAX) Version

CREATE FUNCTION dbo.Split8KTallyM (

@pString VARCHAR(MAX), @pDelimiter VARCHAR(1)

)

RETURNS @Results TABLE (ItemNumber BIGINT, ItemValue VARCHAR(MAX))

AS

BEGIN

SET @pString = @pDelimiter --To avoid the OR condition in the WHERE clause for the first item

+ @pString

+ @pDelimiter --To avoid the ISNULL/NULLIF

INSERT INTO @Results

(

[ItemNumber]

,[ItemValue]

)

SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY N )

, Item = SUBSTRING(@pString, N1,

CAST(CHARINDEX(@pDelimiter ,

@pString COLLATE Latin1_General_BIN, N1) AS INT)

- N1)

FROM [dbo].[Tally] WITH ( NOLOCK ) --NOLOCK HINT IS NOT NECESSARY

WHERE N BETWEEN 1 AND

CONVERT(INT,LEN(@pString)) -- CAST TO THE COLUMN DATATYPE

- 1 -- TO EXCLUDE LAST DELIMITER

AND SUBSTRING(@pString, N,1) = @pDelimiter

RETURN;

END

GO

Now the performance test for RCTE and XML solutions (Did not try the tally table solution as it is quite slow)

USE [tempdb]

GO

SET NOCOUNT ON;

GO

IF OBJECT_ID(N'dbo.iFunction', N'V') IS NOT NULL

DROP VIEW iFunction

GO

SET QUOTED_IDENTIFIER ON

SET ANSI_NULLS ON

GO

CREATE VIEW dbo.iFunction AS

/**********************************************************************************************************************

Purpose:

This view is callable from UDF's which allows us to indirectly get a NEWID() within a function where we can't do such

a thing directly in the function. This view also solves the same problem for GETDATE().

Usage:

SELECT MyNewID FROM dbo.iFunction; --Returns a GUID

SELECT MyDate FROM dbo.iFunction; --Returns a Date

Revision History:

Rev 00 - 06 Jun 2004 - Jeff Moden - Initial creation

Rev 01 - 06 Mar 2011 - Jeff Moden - Formalize code. No logic changes.

**********************************************************************************************************************/

SELECT MyNewID = NEWID(),

MyDate = GETDATE();

GO

IF OBJECT_ID(N'dbo.CsvTest', N'U') IS NOT NULL

DROP TABLE CsvTest

GO

DECLARE @MaxElementWidth INT,

@MinElementWidth INT,

@NumberOfElements INT,

@NumberOfRows INT

/*======== PARAMETER VALUES ==============================*/

SELECT @MaxElementWidth= 150

, @MinElementWidth= 125

, @NumberOfElements= 500

, @NumberOfRows= 10000

/*========================================================*/

SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table

ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,

CSV =

(--==== This creates each CSV

SELECT CAST(

STUFF( --=== STUFF get's rid of the leading comma

( --=== This builds CSV row with a leading comma

SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row

','

+ LEFT(--==== Builds random length variable within element width constraints

LEFT(REPLICATE('1234567890',CEILING(@MaxElementWidth/10.0)), @MaxElementWidth),

ABS(CHECKSUM((SELECT MyNewID FROM dbo.iFunction)))

% (@MaxElementWidth - @MinElementWidth + 1) + @MinElementWidth

)

FROM sys.All_Columns ac3 --Classic cross join pseudo-cursor

CROSS JOIN sys.All_Columns ac4 --can produce row sets up 16 million.

WHERE ac3.Object_ID <> ac1.Object_ID --Without this line, all rows would be the same.

FOR XML PATH('')

)

,1,1,'')

AS VARCHAR(MAX))

)

INTO CsvTest

FROM sys.All_Columns ac1 --Classic cross join pseudo-cursor

CROSS JOIN sys.All_Columns ac2 --can produce row sets up 16 million rows

GO

PRINT '/*====== dbo.CSVTest Population completed ================*/'

PRINT CHAR(10) + CHAR(13)

GO

ALTER TABLE CsvTest

ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100;

GO

IF OBJECT_ID(N'dbo.Split_RCTE', N'IF') IS NOT NULL

DROP FUNCTION dbo.Split_RCTE

GO

SET QUOTED_IDENTIFIER ON

SET ANSI_NULLS ON

GO

--Create Split_RCTE function VARCHAR(MAX) version

CREATE FUNCTION dbo.Split_RCTE

(

@pString VARCHAR(MAX)

,@pDelimiter VARCHAR(1)

)

RETURNS TABLE

WITH SCHEMABINDING

AS

RETURN

WITH cteSplit

AS ( SELECT StartPosition = 0

, EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter,

@pString COLLATE Latin1_General_BIN))

UNION ALL

SELECT StartPosition = EndPosition + 1

, EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter,

@pString COLLATE Latin1_General_BIN,

EndPosition + 1))

FROM cteSplit

WHERE EndPosition > 0

)

SELECT [ItemNumber] = ROW_NUMBER() OVER ( ORDER BY StartPosition )

, SUBSTRING(@pString, StartPosition,

CASE EndPosition

WHEN 0 THEN CONVERT(INT, LEN(@pString)) + 1

ELSE EndPosition - StartPosition

END) ItemValue

FROM cteSplit

GO

IF OBJECT_ID(N'dbo.Split_XML_Solution_By_Oleg', N'TF') IS NOT NULL

DROP FUNCTION dbo.Split_XML_Solution_By_Oleg

GO

--Create Split_XML function VARCHAR(MAX) version

SET QUOTED_IDENTIFIER ON

SET ANSI_NULLS ON

GO

CREATE FUNCTION dbo.Split_XML_Solution_By_Oleg

(

@Parameter VARCHAR(MAX)

,@Delimiter VARCHAR(1)

)

RETURNS @Result TABLE

(

ItemNumber INT

,ItemValue VARCHAR(MAX)

)

AS

BEGIN

DECLARE @XML XML ;

SET @Parameter = ( SELECT @Parameter

FOR XML PATH('')

) ;

SELECT @XML = '<r>' + REPLACE(@Parameter,@Delimiter, '</r><r>') + '</r>' ;

INSERT INTO @Result

(

ItemNumber

,ItemValue

)

SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL) ) AS ItemNumber

, Item.value('text()[1]', 'VARCHAR(MAX)') AS ItemValue

FROM @XML.nodes('//r') R ( Item ) ;

RETURN ;

END ;

GO

PRINT '/*====== dbo.Split_RCTE ==================================*/'

DBCC FREEPROCCACHE WITH NO_INFOMSGS

SET STATISTICS TIME ON

DECLARE @ItemNumber BIGINT

, @Item VARCHAR(MAX) ;

SELECT @ItemNumber = V.ItemNumber

, @Item = V.ItemValue

FROM dbo.CsvTest D

CROSS APPLY dbo.Split_RCTE(D.Csv, ',') V

OPTION ( MAXRECURSION 0 )

SET STATISTICS TIME OFF

PRINT '/*========================================================*/'

PRINT CHAR(10) + CHAR(13)

GO

PRINT '/*====== dbo.Split_XML ===================================*/'

DBCC FREEPROCCACHE WITH NO_INFOMSGS

SET STATISTICS TIME ON

DECLARE @ItemNumber BIGINT

, @Item VARCHAR(MAX) ;

SELECT @ItemNumber = V.ItemNumber

, @Item = V.ItemValue

FROM dbo.CsvTest D

CROSS APPLY dbo.Split_XML_Solution_By_Oleg(D.Csv, ',') V

SET STATISTICS TIME OFF

PRINT '/*========================================================*/'

PRINT CHAR(10) + CHAR(13)

GO

As always, everyone's mileage may differ.

peter-757102 SSCertifiable Points: 6877 More actions · Answer 15

Too bad (for me) this new round cames at a bad time as I will be too bussy this week to particpate in this round of challenges.

As for bad performance with some implementations, please try to bench with option( maxdop 1) too as there exists a bug in SQL Server 2008 and later that can (unpredictably) cause extreme slowdowns when parallel plans are used. We would not want to see results tainted by this of course.