Tally OH! An Improved SQL 8K “CSV Splitter” Function

Question

Tally OH! An Improved SQL 8K “CSV Splitter” Function

Viewing 15 posts - 751 through 765 (of 991 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 1

Tossing in my 2 Cents, I feel the NULL/Nothing/Unknown/Empty discussion is digressing towards the ancient NULL debate.

😎

Given the un-pivotal nature of the function, which produces a pseudo table with a row for each delimiter separated element passed to it:

StringSet{ E1 , E2 , En-1 , En }

('A', 'B', 'Y', 'Z')

| | | |

+------------+------+ | | | |

| ItemNumber | Item | | | | |

+------------+------+ | | | |

| 1 | A |<----' | | |

+------------+------+ | | |

| 2 | B |<---------' | |

+------------+------+ | |

| n-1 | Y |<--------------' |

+------------+------+ |

| n | Z |<-------------------'

+------------+------+

one would expect the same when a NULL value is passed as a parameter.

EmptySet{ NULL }

|

+------------+------+ |

| ItemNumber | Item | |

+------------+------+ |

| 1 | NULL |<----'

+------------+------+

Two thumbs up, that's what you get. What I see as a drawback is that when elements are missing, it does not conform to that pattern returning ''/empty string

StringSet{ E1, E2, En-1 , En }

('A',NULL, 'Y', 'Z')

| | | |

+------------+------+ | | | |

| ItemNumber | Item | | | | |

+------------+------+ | | | |

| 1 | A |<----' | | |

+------------+------+ | | |

| 2 | '' |<---------' | |

+------------+------+ | |

| n-1 | Y |<--------------' |

+------------+------+ |

| n | Z |<-------------------'

+------------+------+

rather than NULL

StringSet{ E1, E2, En-1 , En }

('A',NULL, 'Y', 'Z')

| | | |

+------------+------+ | | | |

| ItemNumber | Item | | | | |

+------------+------+ | | | |

| 1 | A |<----' | | |

+------------+------+ | | |

| 2 | NULL |<---------' | |

+------------+------+ | |

| n-1 | Y |<--------------' |

+------------+------+ |

| n | Z |<-------------------'

+------------+------+

TomThomson SSC Guru Points: 104768 More actions · Answer 2

Eirikur Eiriksson (3/30/2015)
Tossing in my 2 Cents, I feel the NULL/Nothing/Unknown/Empty discussion is digressing towards the ancient NULL debate.
😎
Given the un-pivotal nature of the function, which produces a pseudo table with a row for each delimiter separated element passed to it:
StringSet{ E1 , E2 , En-1 , En }
('A', 'B', 'Y', 'Z')
| | | |
+------------+------+ | | | |
| ItemNumber | Item | | | | |
+------------+------+ | | | |
| 1 | A |<----' | | |
+------------+------+ | | |
| 2 | B |<---------' | |
+------------+------+ | |
| n-1 | Y |<--------------' |
+------------+------+ |
| n | Z |<-------------------'
+------------+------+
one would expect the same when a NULL value is passed as a parameter.
EmptySet{ NULL }
|
+------------+------+ |
| ItemNumber | Item | |
+------------+------+ |
| 1 | NULL |<----'
+------------+------+

Two thumbs up, that's what you get. What I see as a drawback is that when elements are missing, it does not conform to that pattern returning ''/empty string
StringSet{ E1, E2, En-1 , En }
('A',NULL, 'Y', 'Z')
| | | |
+------------+------+ | | | |
| ItemNumber | Item | | | | |
+------------+------+ | | | |
| 1 | A |<----' | | |
+------------+------+ | | |
| 2 | '' |<---------' | |
+------------+------+ | |
| n-1 | Y |<--------------' |
+------------+------+ |
| n | Z |<-------------------'
+------------+------+
rather than NULL
StringSet{ E1, E2, En-1 , En }
('A',NULL, 'Y', 'Z')
| | | |
+------------+------+ | | | |
| ItemNumber | Item | | | | |
+------------+------+ | | | |
| 1 | A |<----' | | |
+------------+------+ | | |
| 2 | NULL |<---------' | |
+------------+------+ | |
| n-1 | Y |<--------------' |
+------------+------+ |
| n | Z |<-------------------'
+------------+------+

The reaon for that, and also of course the reason why it is perfectly OK, is that there is no way your transformation of E2 to NULL can happen, bcause in our input E2 is a list of characters between two delimiters and the characters (since the input is type VARCHAR and is not NULL) are non-NULL values of type CHAR and collectively for the string containing themselves, which isn't NULL. The same applies to all the Es except E1 and En, but it's easy to see that leaving off the left delimiter for E1 and the right delimiter for En still delivers non-NULL values. And when the input is the empty string, both the left and the right delimiter are missing from E1, so E1 is the empty string.

Don't try to use this sort of thing to handle sets of sets instead of lists of strings or you'll probably have to abandon the well-foundedness axiom (a nest of sets which is well ordered by inclusion has finite depth, or however you prefer to state it) :hehe:

Tom

Solomon Rutzky SSCoach Points: 16430 More actions · Answer 3

Eirikur Eiriksson (3/30/2015)
Tossing in my 2 Cents, I feel the NULL/Nothing/Unknown/Empty discussion is digressing towards the ancient NULL debate.

I don't know about "ancient". It seems to be alive and well. Or, "keepin' it real, yo", if you will (and even if you won't) 😉

one would expect the same when a NULL value is passed as a parameter.
EmptySet{ NULL }
|
+------------+------+ |
| ItemNumber | Item | |
+------------+------+ |
| 1 | NULL |<----'
+------------+------+

No. Well, yes, some do seem to expect that. But they shouldn't ;-). The issue is that NULL, in this case, is not an element in the set. It is the absence of elements in the set. In your example here, the set is not empty; it contains one element of NULL (remember: "NULL is not nothing" 🙂 ). An empty set is just that: empty (i.e. { }, not { NULL } ). And while it is true that strings can be NULL, that state cannot be represented in string form due to there being no way to distinguish it from a valid empty string. This is unlike numbers, dates, GUIDs, etc that can have their NULL state represented in string form as an empty string since an empty string has no meaning or representation in those other types. This is why XML has the "xsi:nil" notation so that it can indicate a NULL element as opposed to one that is merely empty.

Two thumbs up, that's what you get. What I see as a drawback is that when elements are missing, it does not conform to that pattern returning '' / empty string
StringSet{ E1, E2, En-1 , En }
('A',NULL, 'Y', 'Z')
| | | |
+------------+------+ | | | |
| ItemNumber | Item | | | | |
+------------+------+ | | | |
| 1 | A |<----' | | |
+------------+------+ | | |
| 2 | '' |<---------' | |
+------------+------+ | |
| n-1 | Y |<--------------' |
+------------+------+ |
| n | Z |<-------------------'
+------------+------+
rather than NULL
StringSet{ E1, E2, En-1 , En }
('A',NULL, 'Y', 'Z')
| | | |
+------------+------+ | | | |
| ItemNumber | Item | | | | |
+------------+------+ | | | |
| 1 | A |<----' | | |
+------------+------+ | | |
| 2 | NULL |<---------' | |
+------------+------+ | |
| n-1 | Y |<--------------' |
+------------+------+ |
| n | Z |<-------------------'
+------------+------+

Rather than pass off this scenario as mere inconvenience, accept it for what it truly is: a complete disproof of your original conclusion that a NULL input should return a NULL row. This case right here proves that NULLs just don't work. And again, this goes back to NULL strings not being serializable in string form. In both cases here you show a NULL element in a string set, which is valid, but ignored the part about there being no way to serialize that into string form (i.e. a delimited list). If it were an INT set, or a DATETIME set, then it could at least be represented as an empty string and properly deserialized back into a NULL. But using a simple delimited string (as opposed to another string format such as XML) cannot handle this. A binary format would also be able to represent a NULL string, but that is not what we are dealing with in terms of this splitter function (or those like it).

So, again, a NULL input is not simply an element without a delimiter: it represents the absence of any elements. Hence the output is (or at least should be) an empty table.

Take care,

Solomon..

SQL# — https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
Sql Quantum Lift — https://SqlQuantumLift.com/ ( company )
Sql Quantum Leap — https://SqlQuantumLeap.com/ ( blog )
Info sites — Collations • Module Signing • SQLCLR

peter-757102 SSCertifiable Points: 6877 More actions · Answer 4

TomThomson (3/30/2015)
peter-757102 (3/30/2015)
The forum ate my post due to a parsing error...grrr...not going to type it all again...just this:
Set types and scalar types follow different logic and sets are not null-able, nor do they need to be!
No they aren't nullable in SQL. Yes, they do need to be (but that's far from its biggest problem). Try actually working in a multi-valued logic framework instead of just paying it lip-service round the edges.
The idea of "null in", "null out" really does not compute when sets are involved.
The idea that if you don't know what goes in you can't say what comes out works perfectly well for all functions whose results depend on the values of their arguments - including set-valued functions with set-valued arguments.
A set works with its own logic such as: union, intersect, except and joins.
Logic tailored to fit its "structure" and "purpose" best.
It is easier to accept it for what it is and get "null in", "null out" behavior using an empty set!
Just use "left outer join" or "outer apply" to get the record with a null column in there when you want to.
I haven't a clue why you imagine that outer join or outer apply has any relevance at all to the case where the value of the set itself is unknown. Of course that's a case that can't occur in SQL because SQL can't handle three-valued logic for anything other than comparisons of atomic values which have system-defined types and hence can't conceive of anything else being allowed not to have a known value.

I have no idea where you are coming from, but we totally cannot understand each other it seems.

Try actually working in a multi-valued logic framework instead of just paying it lip-service round the edges.

Why do I need to....and I am certainly not paying lip-service.

Languages are full of trade-offs as they need to fit he problem domain well and still be as easy to implement and understand as possible.

Having null-able sets would open up a whole can of worms and not only for SQL itself but also for every system build on top of it!

As such it would be a horrible design choice to make unless your problem domain frequently needs it to function right.

The idea that if you don't know what goes in you can't say what comes out works perfectly well for all functions whose results depend on the values of their arguments - including set-valued functions with set-valued arguments.

This would require set types to exists (thus have a known structure) while being null-able.

Else you would not be able to validate a query against a null set.

This brings me back to the point I made in my comment about null-able sets.

Regardless of this, faking null-sets by conditionally re-purposing the only row in a set is just complicating matters.

I haven't a clue why you imagine that outer join or outer apply has any relevance at all to the case where the value of the set itself is unknown.

You misunderstand my point.

What I wrote is that returning an empty set works just as well and is less complicated then returning one row with a null value.

In cases where you want to interpreted the empty set as having one row with a null value, a left outer joint or outer apply would give you just that.

mburbea Right there with Babe Points: 781 More actions · Answer 5

I discovered something interesting today using the LAG and LEAD function. You can use them in TVFs to create cached versions of ctes and intermediary results, the trick is to use LEAD(column,0)

The reason this works is that the lag/lead operators store the result in a highly optimized in memory worktable. Amusingly, this lets you create a pretty quick xml splitter (it's hurt because you have to use a lob for any reasonable sized string due to the costly replaces), but this avoids all those tricks that Paul documented in this article, and whats better is you can actually use it in a TVF!

http://sqlblog.com/blogs/paul_white/archive/2012/09/05/compute-scalars-expressions-and-execution-plan-performance.aspx

This version of the splitter performs well against DelimitedSplit8k. It still is not a winner, but it does bring a good fight to the game.

Now, I'm curious if we can optimize delimitedsplit8k using this trick to cache inline tables somehow.

if(object_id('xmlsplit') is not null) drop function xmlsplit;

GO

create function xmlsplit(@string varchar(max),@delimiter char(1))

returns table

with schemabinding as

return

select ItemNumber = ROW_NUMBER() OVER(ORDER BY (select null)),

item = x.i.value('./text()[1]', 'varchar(8000)')

FROM(

select lead(a,0) over (order by (select 1))

from (VALUES

(convert(xml,'<r>'+replace(@string,@delimiter,'</r><r>')+'</r>',0))

) r(a)

) a(_)

cross apply _.nodes('./r') x(i)

SplitterNametotal_davg_d

DelimitedSplit8kb29.5842860.493071433333333

SqlBaseline7.3524070.122540116666667

xmlsplit30.4328730.50721455

I think most of its penalty is coming from the implicit conversion of everything to varchar(max).

Would I recommend this over CLR? Of course not, but this is really an interesting approach. I'm excited to try messing around with caching result sets using lead().

Actually, it seems to only really lose on the results that returns lots of elements (more than 200). My guess is the problem is the poor cardinality estimate. Nodes has a baked in estimate of <200 rows, which is awful when the expected row count is 2000 and then you cross apply it to a table with a 1000 rows and well yeah you're off by an order of magnitude.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Has anyone tested the routines you will find here,http://www.sommarskog.se/arrays-in-sql.html,and see how they compare? Unfortunately my time is very limited but I found one the routines in a function here at work.

mburbea Right there with Babe Points: 781 More actions · Answer 7

Most of the methods described are worse than delimitedsplit8k. The CLR splitter is decent. And TVPs are unusual and unfortunately, unwieldy to use without storeprocedures and non-normal client code. They also have a large serialization cost which can usually outweigh the cost of using a string splitter.

Actually, when throwing a clr splitter into the mix on trying the xml split again it did win (which is odd).

However, comparing it against my C# splitter it lost horribly.

SplitterNametotal_davg_d

delimitedsplit8kb23.0713120.384521866666667

fn_split3.9283940.0654732333333333

SqlBaseline5.2705230.08784205

xmlsplit23.0493050.384155083333333

fn_split only works on Varchar(8000) strings, and is basically a souped up version of delimitedsplit8k, which figures out all the positions to use and then passes them back to substring (which is really fast).

If you actually compare results you'll see that the xml splitter I posted actually does very well, and it only really starts doing awful when its cardinality estimate of ~180 starts hurting it. The Clr function has a baked in estimate of 1000 which keeps it much closer. Ideally, microsoft would add a cardinality hint for sTVFs, but I don't see them doing that anytime soon unfortunately.

If you can use CLR by far it is the best choice, and no T-SQL solution no matter how clever can ever really beat it. I'll perhaps revise my words when openjson comes along. I've compiled this code against .NET 3.5 so it should be useable with all versions that support the CLR.

My current C# splitter here on github:

https://gist.github.com/mburbea/5e142f846d0141c714a1

I think the chart helps describe the advantages of this new xml approach. But ideally you'd like to use CLR if you can.

Edit:

I screwed up on my power pivot and didn't include the full range. In the full range you can see when the number of elements grow the time it takes skyrockets for the xml based approach. While there is a spike for extremely long elements, its almost nothing compared to the spike cost by the sheer number of them. When the string is very uniformed (caused by 4000 1 element string), then the performance is great but that's because the query analyzer can just reuse the same result over and over again.

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 8

At this point, I'm going to suggest that you post your testdata generators and your test harness.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mburbea Right there with Babe Points: 781 More actions · Answer 9

Sure Jeff, I've added the generator and the full test script into my gist.

Here is a direct link to the script:

https://gist.github.com/mburbea/5e142f846d0141c714a1#file-testscript-sql

This was run against a Sql server 2014 instance. xmlsplit should work equally well on 2012. My CLR function should work on Sql server 2005/2008.

Sqlbaseline is not a splitter, but is designed to give a rough idea of how theoretically fast a sql splitter could be if you were able to just get an oracle of positions to split on.

This is based on your test script, with some modification to try some larger element counted datasets and to be a bit more dynamic in testing.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 10

Here is a snippet I found in a recently developed piece of code:

Stored proc parameter: @reportstring NVARCHAR(max)

DECLARE @tmpTbl TABLE ( value NVARCHAR(200) )

SELECT @tmp = ''

SELECT @i = 1

SELECT @length = LEN(@reportstring)

WHILE (@i <= @length)

BEGIN

SELECT @char = SUBSTRING(@reportstring, @i, 1)

IF (@char = ',')

BEGIN

INSERT INTO @tmpTbl SELECT @tmp

SELECT @tmp = ''

END

ELSE

BEGIN

SELECT @tmp = @tmp + @char

END

SELECT @i = @i + 1

END

INSERT INTO @tmpTbl SELECT @tmp

--select * from @tmpTbl

Jeff Moden SSC Guru Points: 1004679 More actions · Answer 11

Excellent. I'll try to take a look at this over the weekend, unless they hit me with the next ton-o-bricks at work.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 12

Jeff Moden (5/22/2015)
Excellent. I'll try to take a look at this over the weekend, unless they hit me with the next ton-o-bricks at work.

Just ran a quick test on 1000000 row transactional test set table with variable length "description" column, doesn't look too good, too busy at the moment to do more but my thought is that this is a "red herring"

😎

Set generator (slightly verbose ;-))

USE tempdb;

GO

/* View wrapper for using the NEWID() function

within a table value function

*/

IF OBJECT_ID(N'dbo.VNEWID') IS NULL

BEGIN

DECLARE @CREATE_VIEW NVARCHAR(MAX) = N'

CREATE VIEW dbo.VNEWID

AS

SELECT NEWID() AS NID;

'

EXEC (@CREATE_VIEW);

END

/* Test set generator, inspired by Lynn Pettis's random

string function

*/

IF OBJECT_ID(N'dbo.ITVFN_DO_SHAKESPEARE') IS NULL

BEGIN

DECLARE @CREATE_FUNCTION NVARCHAR(MAX) = N'

/*

Sample text set generator, having an infinite number of code

monkeys calling this function for infinite number of times

explains the name ;-)

2015-01-18

Eirikur Eiriksson

*/

CREATE FUNCTION dbo.ITVFN_DO_SHAKESPEARE

(

@BASE_LENGTH INT

,@BASE_VARIANCE INT

,@WORD_LENGTH INT

,@WORD_VARIANCE INT

,@ROWCOUNT INT

,@DELIMITER CHAR(1)

)

RETURNS TABLE

AS

RETURN

WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

,NUMS(N) AS (SELECT TOP (@BASE_LENGTH + @BASE_VARIANCE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4)

,RNUM(R) AS (SELECT TOP (@ROWCOUNT) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

SELECT

RN.R

,((SELECT TOP(@BASE_LENGTH + ((SELECT CHECKSUM(NID) FROM dbo.VNEWID) % @BASE_VARIANCE))

CASE

WHEN (NM.N + RN.R + (CHECKSUM(X.NID) % @WORD_LENGTH)) % @WORD_VARIANCE = 0 THEN @DELIMITER

ELSE CHAR(65 + (ABS(CHECKSUM(X.NID)) % 26))

END

FROM NUMS NM

CROSS APPLY dbo.VNEWID X

FOR XML PATH(''''), TYPE).value(''.[1]'',''VARCHAR(8000)'')) AS RND_TXT

FROM RNUM RN;

';

EXEC (@CREATE_FUNCTION);

END

/* Test set parameters

*/

SET NOCOUNT ON;

DECLARE @SAMPLE_SIZE INT = 1000000; -- Number of "Transactions"

DECLARE @OUTLET_COUNT INT = @SAMPLE_SIZE / 200; -- Number of "Outlets"

DECLARE @BASE_DATE DATE = CONVERT(DATE,'2014-01-01',126); -- Base Date, all dates are based on this.

DECLARE @ZERO_DATE DATE = CONVERT(DATE,'1900-01-01',126); -- Monday 1st. January 1900.

DECLARE @DATE_RANGE INT = 1096; -- +/- 3 Years

DECLARE @MAX_PAY_DAYS INT = 90; -- Pay by date offset

DECLARE @MAX_ITEMS INT = 20; -- Maximum number of Items

DECLARE @ACT_PAY_DAYS INT = 99; -- "Actual" Pay Date

DECLARE @AVG_PER_GROUP INT = 500; -- Additional Group Identifier Parameter

DECLARE @GROUP_COUNT INT = @SAMPLE_SIZE / @AVG_PER_GROUP; -- Number of Groups

DECLARE @CUSTOMER_COUNT INT = @SAMPLE_SIZE / 4; -- Number of Customers

/* Random text generation for "customer details" */

DECLARE @BASE_LENGTH INT = 50 ;

DECLARE @BASE_VARIANCE INT = 49 ;

DECLARE @WORD_LENGTH INT = 7 ;

DECLARE @WORD_VARIANCE INT = 6 ;

/* Get few nulls in the detail column be having slightly fewer

entries than possible customer_id

*/

DECLARE @ROWCOUNT INT = @CUSTOMER_COUNT - 100 ;

DECLARE @DELIMITER CHAR(1) = CHAR(32);

/* "customer details" */

IF OBJECT_ID(N'dbo.TBL_SAMPLE_STRING') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_STRING;

CREATE TABLE dbo.TBL_SAMPLE_STRING

(

SST_ID INT NOT NULL CONSTRAINT PK_DBO_TBL_SAMPLE_STRING_SST_ID PRIMARY KEY CLUSTERED

,SST_VALUE VARCHAR(500) NOT NULL

);

/* Create "Customer Details" */

INSERT INTO dbo.TBL_SAMPLE_STRING (SST_ID, SST_VALUE)

SELECT

X.R

,X.RND_TXT

FROM dbo.ITVFN_DO_SHAKESPEARE(@BASE_LENGTH,@BASE_VARIANCE,@WORD_LENGTH,@WORD_VARIANCE,@ROWCOUNT,@DELIMITER) AS X;

/* Drop the dbo.TBL_SAMPLE_TRANSACTION test set table rather

than

*/

IF OBJECT_ID(N'dbo.TBL_SAMPLE_TRANSACTION') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_TRANSACTION;

/* Inline Tally Table

20^7 = 1,280,000,000 Max

*/

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

,SAMPLE_DATA AS

(

SELECT

NM.N AS TRAN_ID

,DATEADD(DAY,CHECKSUM(NEWID()) % @DATE_RANGE,@BASE_DATE) AS TRAN_DATE

,(ABS(CHECKSUM(NEWID())) % @OUTLET_COUNT) + 1 AS OUTLET_ID

,(ABS(CHECKSUM(NEWID())) % @GROUP_COUNT) + 1 AS GROUP_ID

,(ABS(CHECKSUM(NEWID())) % @CUSTOMER_COUNT) + 1 AS CUSTOMER_ID

,(ABS(CHECKSUM(NEWID())) % @AVG_PER_GROUP) + 1 AS DETAIL_ID

,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

--+ CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

+ CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10,0)

+ CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 100,0) AS PROD_NO

,CONVERT(NUMERIC(12,2),SQRT(ABS(CHECKSUM(NEWID())) + 2),0) AS TOTAL_AMOUNT

,(ABS(CHECKSUM(NEWID())) % @MAX_PAY_DAYS) + 1 AS PAY_BY_DAYS

,(ABS(CHECKSUM(NEWID())) % @ACT_PAY_DAYS) + 1 AS ACT_PAY_DAYS

,(ABS(CHECKSUM(NEWID())) % @MAX_ITEMS) + 1 AS ITEM_COUNT

--,ASCII(':')

FROM NUMS NM

)

SELECT

ISNULL(SD.TRAN_ID,1) AS TRAN_ID

,ISNULL(SD.TRAN_DATE ,@BASE_DATE) AS TRAN_DATE

,ISNULL((DATEDIFF(DAY,@ZERO_DATE,SD.TRAN_DATE) % 7) + 1 ,0) AS WEEK_DAY

,ISNULL(DATEADD(DAY,SD.PAY_BY_DAYS,SD.TRAN_DATE) ,@BASE_DATE) AS PAY_BY_DATE

,ISNULL(DATEADD(DAY,SD.ACT_PAY_DAYS,SD.TRAN_DATE),@BASE_DATE) AS ACT_PAY_DATE

,ISNULL(DATEADD(DAY

,FLOOR((SD.PAY_BY_DAYS + SD.ACT_PAY_DAYS) / 2)

,SD.TRAN_DATE),@BASE_DATE) AS DELIVERY_DATE

,CHAR(65 + ( SD.OUTLET_ID % 26 ))

+ CHAR(65 + ( SD.OUTLET_ID % 20 )) AS LOCATION_CODE

,ISNULL(CHAR(65 + ( SD.ACT_PAY_DAYS % 26 ))

+ CHAR(65 + ( SD.ITEM_COUNT % 20 ))

+ RIGHT(CONVERT(VARCHAR(8),1000000

+ (SD.ACT_PAY_DAYS * SD.ITEM_COUNT),0),6),'ZZ999999') AS EMP_ID

,ISNULL(SD.OUTLET_ID ,1) AS OUTLET_ID

,ISNULL(CONVERT(

TINYINT,1 - SIGN(3 - (SD.OUTLET_ID & 0x03)),0) ,1) AS IS_ONLINE

,ISNULL(CONVERT(

TINYINT,1 - SIGN(7 - (SD.OUTLET_ID & 0x07)),0) ,1) AS IS_PICKUP

,NULLIF(CHAR((68 +

(CHECKSUM(SD.PROD_NO,SD.TOTAL_AMOUNT,SD.ACT_PAY_DAYS) % 4))

* (1 - SIGN(3 - (SD.OUTLET_ID & 0x03)))),'') AS ONLCSR

,NULLIF(CHAR((68 +

(CHECKSUM(SD.PROD_NO,SD.TOTAL_AMOUNT,SD.ACT_PAY_DAYS) % 4))

* (SIGN(3 - (SD.OUTLET_ID & 0x03)))

* SIGN(CHECKSUM(CD.SST_VALUE))),'') AS OFFLCSR

,ISNULL(SD.CUSTOMER_ID ,1) AS CUSTOMER_ID

,ISNULL(SD.GROUP_ID ,1) AS GROUP_ID

,ISNULL(SD.DETAIL_ID ,1) AS DETAIL_ID

,ISNULL(SD.PROD_NO,'ZZ-9:99') AS PROD_NO

,ISNULL(SD.TOTAL_AMOUNT,99.99) AS TOTAL_AMOUNT

,ISNULL(SD.ITEM_COUNT,1) AS ITEM_COUNT

,ISNULL(CONVERT(NUMERIC(9,2),(0.1 * SD.TOTAL_AMOUNT),0),0) AS TAX_AMOUNT

,ISNULL(CONVERT(NUMERIC(9,2)

,(0.9 * SD.TOTAL_AMOUNT / ITEM_COUNT),0),0) AS UNIT_PRICE

,CD.SST_VALUE AS CUSTOMER_DETAIL

INTO dbo.TBL_SAMPLE_TRANSACTION

FROM SAMPLE_DATA SD

LEFT OUTER JOIN dbo.TBL_SAMPLE_STRING CD

ON SD.CUSTOMER_ID = CD.SST_ID;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ADD CONSTRAINT PK_DBO_SAMPLE_TRANSACTION_TRAN_ID PRIMARY KEY CLUSTERED (TRAN_ID ASC);

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN TRAN_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN PAY_BY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN ACT_PAY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN DELIVERY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN LOCATION_CODE CHAR(2) NOT NULL;

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(WEEK_DAY ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(WEEK_DAY ASC, IS_ONLINE ASC, IS_PICKUP ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(TRAN_DATE ASC, IS_ONLINE ASC, IS_PICKUP ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_ONLCSR_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(TRAN_DATE ASC, ONLCSR ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID)

WHERE ONLCSR IS NOT NULL;

Test harness

USE [TEST_SIZE]

GO

DECLARE @INTBUCKET01 INT = 0;

DECLARE @INTBUCKET02 INT = 0;

DECLARE @CHRBUCKET01 VARCHAR(8000) = '';

DECLARE @TIMER TABLE (T_TEXT VARCHAR(50) NOT NULL, T_TS DATETIME2(7) NOT NULL DEFAULT(SYSDATETIME()));

INSERT INTO @TIMER (T_TEXT) VALUES('DelimitedSplitL8K 1');

SELECT

@CHRBUCKET01 = X.ITEM

,@INTBUCKET01 = X.ITEMNUMBER

FROM [DBO].[TBL_SAMPLE_TRANSACTION]

CROSS APPLY DBO.DelimitedSplitL8K([CUSTOMER_DETAIL],CHAR(32)) AS X

INSERT INTO @TIMER (T_TEXT) VALUES('DelimitedSplitL8K 1');

INSERT INTO @TIMER (T_TEXT) VALUES('DelimitedSplitL8K 2');

SELECT

@CHRBUCKET01 = X.ITEM

,@INTBUCKET01 = X.ITEMNUMBER

FROM [DBO].[TBL_SAMPLE_TRANSACTION]

CROSS APPLY DBO.DelimitedSplit8K([CUSTOMER_DETAIL],CHAR(32)) AS X

INSERT INTO @TIMER (T_TEXT) VALUES('DelimitedSplitL8K 2');

INSERT INTO @TIMER (T_TEXT) VALUES('xmlsplit');

SELECT

@CHRBUCKET01 = X.ITEM

,@INTBUCKET01 = X.ITEMNUMBER

FROM [DBO].[TBL_SAMPLE_TRANSACTION]

CROSS APPLY DBO.xmlsplit([CUSTOMER_DETAIL],CHAR(32)) AS X;

INSERT INTO @TIMER (T_TEXT) VALUES('xmlsplit');

METHOD DURATION

------- ----------

DelimitedSplitL8K 1 35200013

DelimitedSplitL8K 2 35603037

xmlsplit 121502949

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 13

Lynn Pettis (5/22/2015)
Here is a snippet I found in a recently developed piece of code:
Stored proc parameter: @reportstring NVARCHAR(max)
DECLARE @tmpTbl TABLE ( value NVARCHAR(200) )
SELECT @tmp = ''
SELECT @i = 1
SELECT @length = LEN(@reportstring)
WHILE (@i <= @length)
BEGIN
SELECT @char = SUBSTRING(@reportstring, @i, 1)
IF (@char = ',')
BEGIN
INSERT INTO @tmpTbl SELECT @tmp
SELECT @tmp = ''
END
ELSE
BEGIN
SELECT @tmp = @tmp + @char
END
SELECT @i = @i + 1
END
INSERT INTO @tmpTbl SELECT @tmp
--select * from @tmpTbl

I remember using similar/same method a while back, performed quite nicely on 2K and earlier systems but it's been a while. This kind of loops can sometimes work well but the fact that it's not an iTVFN marks it as a non-contender before the race starts.

😎

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 14

Eirikur Eiriksson (5/22/2015)
Lynn Pettis (5/22/2015)
Here is a snippet I found in a recently developed piece of code:
Stored proc parameter: @reportstring NVARCHAR(max)
DECLARE @tmpTbl TABLE ( value NVARCHAR(200) )
SELECT @tmp = ''
SELECT @i = 1
SELECT @length = LEN(@reportstring)
WHILE (@i <= @length)
BEGIN
SELECT @char = SUBSTRING(@reportstring, @i, 1)
IF (@char = ',')
BEGIN
INSERT INTO @tmpTbl SELECT @tmp
SELECT @tmp = ''
END
ELSE
BEGIN
SELECT @tmp = @tmp + @char
END
SELECT @i = @i + 1
END
INSERT INTO @tmpTbl SELECT @tmp
--select * from @tmpTbl
I remember using similar/same method a while back, performed quite nicely on 2K and earlier systems but it's been a while. This kind of loops can sometimes work well but the fact that it's not an iTVFN marks it as a non-contender before the race starts.
😎

Actually, this is embedded in a stored procedure. And there are numerous inefficiencies in the code, multiple queries (nested IF's to determine which runs), loads of implicit data conversions, and minor issues that set off my SQL OCD.

mburbea Right there with Babe Points: 781 More actions · Answer 15

Eirikur Eiriksson (5/22/2015)
Jeff Moden (5/22/2015)
Excellent. I'll try to take a look at this over the weekend, unless they hit me with the next ton-o-bricks at work.
Just ran a quick test on 1000000 row transactional test set table with variable length "description" column, doesn't look too good, too busy at the moment to do more but my thought is that this is a "red herring"
😎

Sadly, you're a victim of an optimization bug that makes your test data much less random than it should be. Sql server 2012+ can treat newid as a runtime constant value when you use it through a view unless you use the undocumented traceflag 8690. DelimitedSplit8k still wins, but now its no longer an absolute bloodbath like the results original results suggested.

splitterduration

fn_split4.848461

xmlsplit23.662873

delimitedsplit8kb13.29136

Here is the modified test script which uses this undocumented flag.

use tempdb

if object_id('dbo.fn_split') is not null drop function fn_split;

if object_id('dbo.splitVarbinary') is not null drop function splitvarbinary;

if exists (select 1 from sys.assemblies where name='split') drop assembly split;

CREATE ASSEMBLY [Split]

AUTHORIZATION [dbo]

FROM 0x4D5A90000300000004000000FFFF0000B800000000000000400000000000000000000000000000000000000000000000000000000000000000000000800000000E1FBA0E00B409CD21B8014CCD21546869732070726F6772616D2063616E6E6F742062652072756E20696E20444F53206D6F64652E0D0D0A2400000000000000504500004C010300873F5F550000000000000000E00002210B010B00000C000000060000000000001E2A0000002000000040000000000010002000000002000004000000000000000400000000000000008000000002000000000000030040850000100000100000000010000010000000000000100000000000000000000000C829000053000000004000009802000000000000000000000000000000000000006000000C00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000200000080000000000000000000000082000004800000000000000000000002E74657874000000240A000000200000000C000000020000000000000000000000000000200000602E72737263000000980200000040000000040000000E0000000000000000000000000000400000402E72656C6F6300000C0000000060000000020000001200000000000000000000000000004000004200000000000000000000000000000000002A0000000000004800000002000500A02100002808000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000006E026F0700000A2D0D026F0800000A0373060000062A7E010000042A1330020020000000010000110274030000020A03067B020000045404067B030000045405067B04000004542A46168D03000002280900000A80010000042A1E02280A00000A2A1E02280A00000A2ABA0273050000067D0700000402280A00000A02037D0500000402047D0600000402027B050000048E697D090000042A000000133003009600000002000011027B08000004027B090000043302162A027B080000040A027B060000040B027B050000040C027B070000040D09257B0200000417587D02000004090617587D030000042B2808069107331E027B0700000406027B08000004597D04000004020617587D08000004172A0617580A06088E6932D209027B09000004027B08000004597D0400000402027B050000048E697D08000004172A1A730B00000A7A1E027B070000042A00000042534A4201000100000000000C00000076322E302E35303732370000000005006C00000084020000237E0000F00200009002000023537472696E67730000000080050000080000002355530088050000100000002347554944000000980500009002000023426C6F6200000000000000020000015717A2030902000000FA253300160000010000000A00000004000000090000000900000008000000010000000B00000005000000020000000100000001000000010000000100000001000000020000000200000000000A00010000000000060051004A0006006B0058000A00A4008F000A006D0152010600B70198010600EE01CE0106000E02CE010A0032025201060063024A00060077024A000000000001000000000001000100010010001400000005000100010003001000290000000500020005000300100030000000050005000600310077000A000600CE0025000600D10025000600D70025002100DB0028002100E2002C002100ED002F000100F50025002100FC0025005020000000009600AD000E0001006C20000000009600BC0016000300AA20000000008618C8002100070098200000000091185C0256020700B220000000008618C80021000700BA20000000008318C80033000700EC2000000000E60104013A0009008E2100000000E1010D0121000900952100000000E60938013E000900000001004C01000002007F01000001008901020002008D0102000300C40102000400CA01000001004C01000002007F01040009001100320121002100C80021002900C80021003100C800AA003900C80021004100C8002100190047023A00190052024C02490069025A020900C80021005100C800210020003300AF002400130046002E00230068022E002B00710244001300780051025F02040001000000440142000200090003000400100003000480000000000000000000000000000000002C0200000200000000000000000000000100410000000000020000000000000000000000010083000000000003000200040002000000003C4D6F64756C653E0053706C69742E646C6C0055736572446566696E656446756E6374696F6E7300526573756C7400487962726964456E756D657261746F72006D73636F726C69620053797374656D004F626A6563740053797374656D2E436F6C6C656374696F6E730049456E756D657261746F7200456D707479526573756C740053797374656D2E446174610053797374656D2E446174612E53716C54797065730053716C42797465730053706C697456617262696E6172790046696C6C5F526573756C74002E63746F72004964005374617274004E756D005F6279746573005F64656C696D69746572005F726573756C74005F7374617274005F6C656E677468004D6F76654E6578740053797374656D2E436F6C6C656374696F6E732E49456E756D657261746F722E5265736574005265736574006765745F43757272656E740043757272656E74006279746573004D6963726F736F66742E53716C5365727665722E5365727665720053716C46616365744174747269627574650064656C696D69746572006F626A006974656D4E756D6265720053797374656D2E52756E74696D652E496E7465726F705365727669636573004F7574417474726962757465007374617274006E756D0053797374656D2E52756E74696D652E436F6D70696C6572536572766963657300436F6D70696C6174696F6E52656C61786174696F6E734174747269627574650052756E74696D65436F6D7061746962696C6974794174747269627574650053706C69740053716C46756E6374696F6E417474726962757465006765745F49734E756C6C006765745F56616C7565002E6363746F7200417272617900476574456E756D657261746F72004E6F74496D706C656D656E746564457863657074696F6E000000032000000000009234DF68C0472F418BEFAA63B8E29A4F0008B77A5C561934E089030612090700021209120D050A0004011C1008100810080320000102060803061D050206050306120C062002011D0505032000020320001C0328001C31010003005408074D617853697A65401F000054020D497346697865644C656E6774680054020A49734E756C6C61626C650031010003005408074D617853697A650100000054020D497346697865644C656E6774680154020A49734E756C6C61626C65000420010108819B010006005455794D6963726F736F66742E53716C5365727665722E5365727665722E446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038390A446174614163636573730000000054557F4D6963726F736F66742E53716C5365727665722E5365727665722E53797374656D446174614163636573734B696E642C2053797374656D2E446174612C2056657273696F6E3D322E302E302E302C2043756C747572653D6E65757472616C2C205075626C69634B6579546F6B656E3D623737613563353631393334653038391053797374656D446174614163636573730000000054020F497344657465726D696E69737469630154020949735072656369736501540E1146696C6C526F774D6574686F644E616D650B46696C6C5F526573756C74540E0F5461626C65446566696E6974696F6E226974656D4E756D62657220696E742C20737461727420696E742C206E756D20696E740420001D05040701120C03000001042000120908070408051D05120C0801000800000000001E01000100540216577261704E6F6E457863657074696F6E5468726F777301F029000000000000000000000E2A0000002000000000000000000000000000000000000000000000002A000000000000000000000000000000005F436F72446C6C4D61696E006D73636F7265652E646C6C0000000000FF25002000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001001000000018000080000000000000000000000000000001000100000030000080000000000000000000000000000001000000000048000000584000003C02000000000000000000003C0234000000560053005F00560045005200530049004F004E005F0049004E0046004F0000000000BD04EFFE00000100000000000000000000000000000000003F000000000000000400000002000000000000000000000000000000440000000100560061007200460069006C00650049006E0066006F00000000002400040000005400720061006E0073006C006100740069006F006E00000000000000B0049C010000010053007400720069006E006700460069006C00650049006E0066006F0000007801000001003000300030003000300034006200300000002C0002000100460069006C0065004400650073006300720069007000740069006F006E000000000020000000300008000100460069006C006500560065007200730069006F006E000000000030002E0030002E0030002E003000000034000A00010049006E007400650072006E0061006C004E0061006D0065000000530070006C00690074002E0064006C006C0000002800020001004C006500670061006C0043006F0070007900720069006700680074000000200000003C000A0001004F0072006900670069006E0061006C00460069006C0065006E0061006D0065000000530070006C00690074002E0064006C006C000000340008000100500072006F006400750063007400560065007200730069006F006E00000030002E0030002E0030002E003000000038000800010041007300730065006D0062006C0079002000560065007200730069006F006E00000030002E0030002E0030002E003000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002000000C000000203A00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

WITH PERMISSION_SET = SAFE;

GO

CREATE FUNCTION [dbo].[SplitVarbinary]

(@bytes VARBINARY (8000), @delimiter TINYINT)

RETURNS

TABLE (

[itemNumber] INT NULL,

[start] INT NULL,

[num] INT NULL)

AS EXTERNAL NAME [Split].[UserDefinedFunctions].[SplitVarbinary];

GO

CREATE function [dbo].[fn_split](@string varchar(8000),@delimiter char(1))

returns table with schemabinding as

return

select itemNumber,item=substring(@string,start,num)

from dbo.SplitVarbinary(convert(varbinary(8000),@string),ascii(@delimiter));

GO

if(object_id('xmlsplit') is not null) drop function xmlsplit;

GO

create function xmlsplit(@string varchar(max),@delimiter char(1))

returns table

with schemabinding as

return

select ItemNumber = ROW_NUMBER() OVER(ORDER BY (select null)),

item = x.i.value('./text()[1]', 'varchar(8000)')

FROM(

select lead(a,0) over (order by (select 1))

from (VALUES

(convert(xml,'<r>'+replace(@string,@delimiter,'</r><r>')+'</r>',0))

) r(a)

) a(_)

cross apply _.nodes('./r') x(i)

GO

if(OBJECT_ID('delimitedSplit8kb') is not null) drop function DelimitedSplit8Kb;

GO

CREATE FUNCTION [dbo].[DelimitedSplit8KB]

--===== Define I/O parameters

(@pString VARCHAR(8000) , @pDelimiter CHAR(1))

--WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

RETURNS TABLE WITH SCHEMABINDING AS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

-- enough to cover VARCHAR(8000)

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1

),

E2(N) AS(select 1 from E1,E1 b),

E4(N) as(select 1 from e2,e2 b),

cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (select null)) FROM E4

),

cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT 1

UNION ALL

SELECT t.N+1 FROM cteTally t

WHERE SUBSTRING(@pString,t.N,1) COLLATE Latin1_General_BIN = @pDelimiter COLLATE Latin1_General_BIN

),

cteLen(N1,L1) AS(

SELECT s.N1,

ISNULL(NULLIF(CHARINDEX(@pDelimiter COLLATE Latin1_General_BIN,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)

FROM cteStart s

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

Item = SUBSTRING(@pString, l.N1, l.L1)

FROM cteLen l

;

GO

IF OBJECT_ID(N'dbo.VNEWID') IS NULL

BEGIN

DECLARE @CREATE_VIEW NVARCHAR(MAX) = N'

CREATE VIEW dbo.VNEWID

AS

SELECT NEWID() AS NID;

'

EXEC (@CREATE_VIEW);

END

IF OBJECT_ID(N'dbo.ITVFN_DO_SHAKESPEARE') IS NULL

BEGIN

DECLARE @CREATE_FUNCTION NVARCHAR(MAX) = N'

/*

Sample text set generator, having an infinite number of code

monkeys calling this function for infinite number of times

explains the name [Wink]

2015-01-18

Eirikur Eiriksson

*/

CREATE FUNCTION dbo.ITVFN_DO_SHAKESPEARE

(

@BASE_LENGTH INT

,@BASE_VARIANCE INT

,@WORD_LENGTH INT

,@WORD_VARIANCE INT

,@ROWCOUNT INT

,@DELIMITER CHAR(1)

)

RETURNS TABLE

AS

RETURN

WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

,NUMS(N) AS (SELECT TOP (@BASE_LENGTH + @BASE_VARIANCE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4)

,RNUM(R) AS (SELECT TOP (@ROWCOUNT) ROW_NUMBER() OVER (ORDER BY (SELECT NULL) DESC) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

SELECT

RN.R

,((SELECT TOP(@BASE_LENGTH + ((SELECT CHECKSUM(NID) FROM dbo.VNEWID) % @BASE_VARIANCE))

CASE

WHEN (NM.N + RN.R + (CHECKSUM(X.NID) % @WORD_LENGTH)) % @WORD_VARIANCE = 0 THEN @DELIMITER

ELSE CHAR(65 + (ABS(CHECKSUM(X.NID)) % 26))

END

FROM NUMS NM

CROSS APPLY dbo.VNEWID X

FOR XML PATH(''''), TYPE).value(''.[1]'',''VARCHAR(8000)'')) AS RND_TXT

FROM RNUM RN;

';

EXEC (@CREATE_FUNCTION);

END

/* Test set parameters

*/

SET NOCOUNT ON;

DECLARE @SAMPLE_SIZE INT = 1000000; -- Number of "Transactions"

DECLARE @OUTLET_COUNT INT = @SAMPLE_SIZE / 200; -- Number of "Outlets"

DECLARE @BASE_DATE DATE = CONVERT(DATE,'2014-01-01',126); -- Base Date, all dates are based on this.

DECLARE @ZERO_DATE DATE = CONVERT(DATE,'1900-01-01',126); -- Monday 1st. January 1900.

DECLARE @DATE_RANGE INT = 1096; -- +/- 3 Years

DECLARE @MAX_PAY_DAYS INT = 90; -- Pay by date offset

DECLARE @MAX_ITEMS INT = 20; -- Maximum number of Items

DECLARE @ACT_PAY_DAYS INT = 99; -- "Actual" Pay Date

DECLARE @AVG_PER_GROUP INT = 500; -- Additional Group Identifier Parameter

DECLARE @GROUP_COUNT INT = @SAMPLE_SIZE / @AVG_PER_GROUP; -- Number of Groups

DECLARE @CUSTOMER_COUNT INT = @SAMPLE_SIZE / 4; -- Number of Customers

/* Random text generation for "customer details" */

DECLARE @BASE_LENGTH INT = 50 ;

DECLARE @BASE_VARIANCE INT = 49 ;

DECLARE @WORD_LENGTH INT = 7 ;

DECLARE @WORD_VARIANCE INT = 6 ;

/* Get few nulls in the detail column be having slightly fewer

entries than possible customer_id

*/

DECLARE @ROWCOUNT INT = @CUSTOMER_COUNT - 100 ;

DECLARE @DELIMITER CHAR(1) = CHAR(32);

/* "customer details" */

IF OBJECT_ID(N'dbo.TBL_SAMPLE_STRING') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_STRING;

CREATE TABLE dbo.TBL_SAMPLE_STRING

(

SST_ID INT NOT NULL CONSTRAINT PK_DBO_TBL_SAMPLE_STRING_SST_ID PRIMARY KEY CLUSTERED

,SST_VALUE VARCHAR(500) NOT NULL

);

/* Create "Customer Details" */

INSERT INTO dbo.TBL_SAMPLE_STRING (SST_ID, SST_VALUE)

SELECT

X.R

,X.RND_TXT

FROM dbo.ITVFN_DO_SHAKESPEARE(@BASE_LENGTH,@BASE_VARIANCE,@WORD_LENGTH,@WORD_VARIANCE,@ROWCOUNT,@DELIMITER) AS X

OPTION (QUERYTRACEON 8690);

/* Drop the dbo.TBL_SAMPLE_TRANSACTION test set table rather

than

*/

IF OBJECT_ID(N'dbo.TBL_SAMPLE_TRANSACTION') IS NOT NULL DROP TABLE dbo.TBL_SAMPLE_TRANSACTION;

/* Inline Tally Table

20^7 = 1,280,000,000 Max

*/

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

,SAMPLE_DATA AS

(

SELECT

NM.N AS TRAN_ID

,DATEADD(DAY,CHECKSUM(NEWID()) % @DATE_RANGE,@BASE_DATE) AS TRAN_DATE

,(ABS(CHECKSUM(NEWID())) % @OUTLET_COUNT) + 1 AS OUTLET_ID

,(ABS(CHECKSUM(NEWID())) % @GROUP_COUNT) + 1 AS GROUP_ID

,(ABS(CHECKSUM(NEWID())) % @CUSTOMER_COUNT) + 1 AS CUSTOMER_ID

,(ABS(CHECKSUM(NEWID())) % @AVG_PER_GROUP) + 1 AS DETAIL_ID

,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

--+ CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

+ CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10,0)

+ CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 100,0) AS PROD_NO

,CONVERT(NUMERIC(12,2),SQRT(ABS(CHECKSUM(NEWID())) + 2),0) AS TOTAL_AMOUNT

,(ABS(CHECKSUM(NEWID())) % @MAX_PAY_DAYS) + 1 AS PAY_BY_DAYS

,(ABS(CHECKSUM(NEWID())) % @ACT_PAY_DAYS) + 1 AS ACT_PAY_DAYS

,(ABS(CHECKSUM(NEWID())) % @MAX_ITEMS) + 1 AS ITEM_COUNT

--,ASCII(':')

FROM NUMS NM

)

SELECT

ISNULL(SD.TRAN_ID,1) AS TRAN_ID

,ISNULL(SD.TRAN_DATE ,@BASE_DATE) AS TRAN_DATE

,ISNULL((DATEDIFF(DAY,@ZERO_DATE,SD.TRAN_DATE) % 7) + 1 ,0) AS WEEK_DAY

,ISNULL(DATEADD(DAY,SD.PAY_BY_DAYS,SD.TRAN_DATE) ,@BASE_DATE) AS PAY_BY_DATE

,ISNULL(DATEADD(DAY,SD.ACT_PAY_DAYS,SD.TRAN_DATE),@BASE_DATE) AS ACT_PAY_DATE

,ISNULL(DATEADD(DAY

,FLOOR((SD.PAY_BY_DAYS + SD.ACT_PAY_DAYS) / 2)

,SD.TRAN_DATE),@BASE_DATE) AS DELIVERY_DATE

,CHAR(65 + ( SD.OUTLET_ID % 26 ))

+ CHAR(65 + ( SD.OUTLET_ID % 20 )) AS LOCATION_CODE

,ISNULL(CHAR(65 + ( SD.ACT_PAY_DAYS % 26 ))

+ CHAR(65 + ( SD.ITEM_COUNT % 20 ))

+ RIGHT(CONVERT(VARCHAR(8),1000000

+ (SD.ACT_PAY_DAYS * SD.ITEM_COUNT),0),6),'ZZ999999') AS EMP_ID

,ISNULL(SD.OUTLET_ID ,1) AS OUTLET_ID

,ISNULL(CONVERT(

TINYINT,1 - SIGN(3 - (SD.OUTLET_ID & 0x03)),0) ,1) AS IS_ONLINE

,ISNULL(CONVERT(

TINYINT,1 - SIGN(7 - (SD.OUTLET_ID & 0x07)),0) ,1) AS IS_PICKUP

,NULLIF(CHAR((68 +

(CHECKSUM(SD.PROD_NO,SD.TOTAL_AMOUNT,SD.ACT_PAY_DAYS) % 4))

* (1 - SIGN(3 - (SD.OUTLET_ID & 0x03)))),'') AS ONLCSR

,NULLIF(CHAR((68 +

(CHECKSUM(SD.PROD_NO,SD.TOTAL_AMOUNT,SD.ACT_PAY_DAYS) % 4))

* (SIGN(3 - (SD.OUTLET_ID & 0x03)))

* SIGN(CHECKSUM(CD.SST_VALUE))),'') AS OFFLCSR

,ISNULL(SD.CUSTOMER_ID ,1) AS CUSTOMER_ID

,ISNULL(SD.GROUP_ID ,1) AS GROUP_ID

,ISNULL(SD.DETAIL_ID ,1) AS DETAIL_ID

,ISNULL(SD.PROD_NO,'ZZ-9:99') AS PROD_NO

,ISNULL(SD.TOTAL_AMOUNT,99.99) AS TOTAL_AMOUNT

,ISNULL(SD.ITEM_COUNT,1) AS ITEM_COUNT

,ISNULL(CONVERT(NUMERIC(9,2),(0.1 * SD.TOTAL_AMOUNT),0),0) AS TAX_AMOUNT

,ISNULL(CONVERT(NUMERIC(9,2)

,(0.9 * SD.TOTAL_AMOUNT / ITEM_COUNT),0),0) AS UNIT_PRICE

,CD.SST_VALUE AS CUSTOMER_DETAIL

INTO dbo.TBL_SAMPLE_TRANSACTION

FROM SAMPLE_DATA SD

LEFT OUTER JOIN dbo.TBL_SAMPLE_STRING CD

ON SD.CUSTOMER_ID = CD.SST_ID

OPTION (QUERYTRACEON 8690);

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ADD CONSTRAINT PK_DBO_SAMPLE_TRANSACTION_TRAN_ID PRIMARY KEY CLUSTERED (TRAN_ID ASC);

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN TRAN_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN PAY_BY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN ACT_PAY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN DELIVERY_DATE DATE NOT NULL;

ALTER TABLE dbo.TBL_SAMPLE_TRANSACTION ALTER COLUMN LOCATION_CODE CHAR(2) NOT NULL;

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(WEEK_DAY ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_WEEK_DAY_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(WEEK_DAY ASC, IS_ONLINE ASC, IS_PICKUP ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_IS_ONLINE_IS_PICKUP_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(TRAN_DATE ASC, IS_ONLINE ASC, IS_PICKUP ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID);

CREATE NONCLUSTERED INDEX NCLIDX_DBO_SAMPLE_TRANSACTION_TRAN_DATE_ONLCSR_INCL_TRAN_ID_CUSTOMER_ID ON dbo.TBL_SAMPLE_TRANSACTION

(TRAN_DATE ASC, ONLCSR ASC)

INCLUDE (TRAN_ID,CUSTOMER_ID)

WHERE ONLCSR IS NOT NULL;

GO

IF OBJECT_ID('dbo.TestResults','U') IS NOT NULL DROP TABLE dbo.TestResults;

go

CREATE TABLE dbo.TestResults

(

splitter sysname,

duration float,

);

go

if object_id('dbo.functions','u') is not null drop table functions;

create table functions

(

splitter sysname

)

GO

insert into dbo.functions

VALUES ('fn_split'),

('xmlsplit'),

('delimitedsplit8kb')

DECLARE @STR NVARCHAR(MAX) ='declare @startTime datetime2,@endTime datetime2,@in int,@v varchar(8000)'+(select

';RAISERROR(''Testing '+c.splitter+''',10,1) WITH NOWAIT;

-- warm up the splitters.

select @in = count(*) from '+c.splitter+'(''1,2,3,4'',char(44));

DBCC FREEPROCCACHE;DBCC DROPCLEANBUFFERS;

--===== Start the timer

SELECT @StartTime = sysdatetime();

--===== Run the test

SELECT @in = split.ItemNumber, @v =split.Item

FROM dbo.[TBL_SAMPLE_TRANSACTION] csv

CROSS APPLY dbo.'+c.splitter+'(csv.CUSTOMER_DETAIL,char(32)) split

--===== Stop the timer and record the test

select @EndTime= sysdatetime();

INSERT INTO dbo.TestResults

(Splitter, Duration)

SELECT '''+c.splitter+''',

DATEDIFF(microsecond,@StartTime,@EndTime)/1e6'

from dbo.functions c

for xml path(''));

select @str=REPLACE(@str,' ','')

--select @str;

exec (@str);

select * FROM TestResults;

Still the fact that the result is only about ~1.8 as slow as delimited split is impressive as it is currently paying the cost of converting every element to varchar(max) and then again to xml.

As always, if you can use SQL CLR. If you cannot and your strings are varchar(8000)'s then use delimitedsplit8k. if they are varchar(max), you may want to consider using xmlsplit.

Edit: For whatever reason the xml splitter is causing it to sort the table by the customer description column. This is causing the massive extra time sink on the table unfortunately. The xml split needs to sort it to perform its nested loop join for whatever reason. I'm no expert in explaining this behavior, but I couldn't really figure out how to discourage it from doing so.