• Hi,

    I have used and referred the same splitter many times but need this to be implemented in a real time environment recently. So, I had to do some more research on it. I started with some minor improvements and ask for suggestions on the sub-site. Many thanks to Oleg, who provided another pointer with a very fruitful discussion and with an interesting 1133 elements, 10000 rows performance test (I am not going into the details and performance comparisons made). I tweaked the physical structure of the tally table a little bit according to the requirement by adding a pre-calculated column N1 which would have the value N+1 but there was only a minor gain as it reduces some IO. But the most important change was conversion of tally table’s column’s data-type i.e. from BIGINT to INT. I have come up with 2 versions so far.

    Now, most importantly the performance test. I followed the article’s performance test and I saw major improvement and it beats all the versions. The margin was more significant when the number of elements increases or when the size of the string was bigger. When I used this version of the splitter, for 1133 elements & 10000 rows, the test completed in approximately 19 seconds, which seems very promising as these are still iTVFs.

    USE [tempdb]

    GO

    SET NOCOUNT ON;

    GO

    IF OBJECT_ID('dbo.CsvTest') IS NOT NULL

    DROP TABLE CsvTest

    GO

    DECLARE @NumberOfElements INT,

    @NumberOfRows INT

    /*======== PARAMETER VALUES ==============================*/

    SELECT @NumberOfElements= 1133

    , @NumberOfRows= 10000

    /*========================================================*/

    SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table

    ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,

    (

    SELECT CAST(STUFF( --=== STUFF get's rid of the leading comma

    ( --=== This builds CSV row with a leading comma

    SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row

    ','+CAST(ABS(CHECKSUM(NEWID()))%100000 AS VARCHAR(10))

    FROM sys.All_Columns t3 --Classic cross join pseudo-cursor

    CROSS JOIN sys.All_Columns t4 --can produce row sets up 121 million.

    WHERE t1.Object_ID <> t3.Object_ID --Without this line, all rows would be the same

    FOR XML PATH('')

    )

    ,1,1,'') AS VARCHAR(8000))

    ) AS Csv

    INTO CsvTest

    FROM sys.All_Columns t1 --Classic cross join pseudo-cursor

    CROSS JOIN sys.All_Columns t2 --can produce row sets up 16 million rows

    GO

    ALTER TABLE CsvTest

    ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100;

    GO

    IF OBJECT_ID('dbo.Tally','U') IS NOT NULL

    DROP TABLE dbo.Tally;

    --===== Create and populate the Tally table on the fly.

    SELECT TOP 11000

    IDENTITY(INT,1,1) AS N, --PREVIOUSLY WAS BIGINT

    -- This ISNULL function makes the column NOT NULL

    ISNULL(ISNULL(CAST(ROW_NUMBER() OVER ( ORDER BY (SELECT NULL)) AS INT),0)+ 1, 0) N1

    INTO dbo.Tally

    FROM sys.all_columns ac1

    CROSS JOIN sys.all_columns ac2

    ;

    GO

    --===== Add a CLUSTERED Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N

    PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    ;

    --===== Update Stats

    UPDATE STATISTICS dbo.Tally WITH FULLSCAN

    ;

    GO

    --=====================================================================================================================

    -- Conditionally drop and recreate the Split8KTally_With_Column_N1 splitter function.

    --=====================================================================================================================

    IF OBJECT_ID(N'dbo.Split8KTally_With_Column_N1', N'IF') IS NOT NULL

    DROP FUNCTION dbo.Split8KTally_With_Column_N1;

    GO

    --===== Tally Table (Split8KTally_With_Column_N1 ITVF) ===============================================================================

    /*===== THIS FUNCTION USES COLUMN "N1" OF TALLY TABLE HAVING PRE-CALCULATED VALUES FOR N+1 ===========*/

    CREATE FUNCTION dbo.Split8KTally_With_Column_N1 (

    @pString VARCHAR(8000), @pDelimiter VARCHAR(1)

    )

    RETURNS TABLE

    AS

    RETURN

    /*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/

    SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER

    , ItemValue = SUBSTRING(@pString, 1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,

    @pString COLLATE Latin1_General_BIN), 0)

    - 1, 8000 ))

    UNION ALL -- CONCATENATE THE OUTPUTS

    /*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/

    SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] )

    , ItemValue = SUBSTRING(@pString, [N1],

    ISNULL(NULLIF(CHARINDEX(@pDelimiter ,

    @pString COLLATE Latin1_General_BIN, [N1]), 0)

    - [N1], 8000) )

    FROM [dbo].[Tally] WITH ( NOLOCK ) --NOLOCK HINT IS NOT NECESSARY

    WHERE [N] BETWEEN 1 AND LEN(@pString)

    AND SUBSTRING(@pString, [N], 1) = @pDelimiter

    GO

    --=====================================================================================================================

    -- Conditionally drop and recreate the Split8KTally_With_N_PLUS_1 splitter function.

    --=====================================================================================================================

    IF OBJECT_ID('dbo.Split8KTally_With_N_PLUS_1','IF') IS NOT NULL

    DROP FUNCTION [dbo].Split8KTally_With_N_PLUS_1

    GO

    --===== Tally Table (Split8KTally_With_N_PLUS_1 ITVF) ===============================================================================

    /*===== THIS FUNCTION USES RUNTIME CALCULATION "N + 1" INSTEAD OF USING THE PRE-CALCULATED COLUMN "N1" OF THE TALLY TABLE===========*/

    CREATE FUNCTION dbo.Split8KTally_With_N_PLUS_1 (

    @pString VARCHAR(8000), @pDelimiter VARCHAR(1)

    )

    RETURNS TABLE

    AS

    RETURN

    /*======= FIND THE FIRST DELIMITED ITEM EXPLICITLY, AS WE SKIPPED THE FIRST ITEM TO AVOID "OR" CONDITION IN THE WHERE CLAUSE=====*/

    SELECT ItemNumber = CAST(0 AS BIGINT) --STARTING WITH ZERO FOR SORT AND TO AVOID ROW_NUMBER + 1 LATER

    , ItemValue = SUBSTRING(@pString, 1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,

    @pString COLLATE Latin1_General_BIN), 0)

    - 1, 8000 ))

    UNION ALL -- CONCATENATE THE OUTPUTS

    /*======= FIND REST OF THE DELIMITED ITEMS USING TALLY TABLE ======================================*/

    SELECT ItemNumber = ROW_NUMBER() OVER ( ORDER BY [N] )

    , ItemValue = SUBSTRING(@pString, (N+1),

    ISNULL(NULLIF(CHARINDEX(@pDelimiter ,

    @pString COLLATE Latin1_General_BIN, (N+1)), 0)

    - (N+1), 8000) )

    FROM [dbo].[Tally] WITH ( NOLOCK )--NOLOCK HINT IS NOT NECESSARY

    WHERE [N] BETWEEN 1 AND LEN(@pString)

    AND SUBSTRING(@pString, [N], 1) = @pDelimiter

    GO

    --Tally Test

    PRINT '/*====== dbo.Split8KTally_With_N_PLUS_1 =================*/'

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    SET STATISTICS TIME ON

    DECLARE @ItemNumber BIGINT

    , @Item VARCHAR(8000) ;

    SELECT @ItemNumber = V.ItemNumber

    , @Item = V.ItemValue

    FROM dbo.CsvTest D

    CROSS APPLY dbo.Split8KTally_With_N_PLUS_1(D.Csv, ',') V

    SET STATISTICS TIME OFF

    PRINT '/*========================================================*/'

    PRINT CHAR(10) + CHAR(13)

    GO

    PRINT '/*====== dbo.Split8KTally_With_Column_N1 =================*/'

    DBCC FREEPROCCACHE WITH NO_INFOMSGS

    SET STATISTICS TIME ON

    DECLARE @ItemNumber BIGINT

    , @Item VARCHAR(8000) ;

    SELECT @ItemNumber = V.ItemNumber

    , @Item = V.ItemValue

    FROM dbo.CsvTest D

    CROSS APPLY dbo.Split8KTally_With_Column_N1(D.Csv, ',') V

    SET STATISTICS TIME OFF

    PRINT '/*========================================================*/'

    PRINT CHAR(10) + CHAR(13)

    GO

    Last but not least, thanks to all who are contributing voluntarily to help the community, however, cannot thanks enough to one of my favorite mentors Jeff Moden.

    P.S. The naming conventions are not according to the standards as the purpose is to convey the core of the function 🙂 Moreover, we all know everyone’s mileage may differ.