Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««5455565758»»»

Tally OH! An Improved SQL 8K “CSV Splitter” Function Expand / Collapse
Author
Message
Posted Wednesday, June 26, 2013 10:09 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, August 29, 2014 10:51 AM
Points: 326, Visits: 2,219
michael 77302 (6/26/2013)
The odd thing is that none of the possible result sets give an Item with a length of 0 (that would result in the error). I think something gets optimized so that an error is prematurely raised.

If I add one more CTE level to the end of the splitter function, specifying that ItemNumber > 0, this error does not occur.


It is a little known fact that SQL Server is extremely liberal as to the order in which expressions within the query, including those within the function, are executed. An expression in the select clause that you logically expect to be filtered away CAN BE executed BEFORE the expression(s) that would filter it away and thus generate an exception.

Your case is a nice example of just how easy it is to overlook such cases, especially when wrapped inside a function. In this case I suspect the separated items, including an empty tail item is fed to your expression in the where clause, before that tail item is removed by an appropiate condition in the function.

The reason the varchar(max) stops this is most likely a type conversion that changes the order of expression execution into something we humans instinctively expect.

This version of your code will work without error as the incorrect value -1 for the right function is turned into a null, which is harmless to the right function and causes no side effects for the rest of your query.

DECLARE @VALUES VARCHAR(8000)
SET @VALUES = 'E62|E48|E47'

SELECT
Item
FROM dbo.[DelimitedSplit8K](@VALUES, '|')
WHERE ISNUMERIC(RIGHT(Item, nullif(LEN(Item) - 1, -1))) > 0


Another tip i can hand you is not to use Len in this instance, as it also right-trims the input.
This makes it generally speaking a more expensive function then datalength, which does what you expect Len to do here.
Post #1467759
Posted Wednesday, June 26, 2013 7:45 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:58 AM
Points: 36,995, Visits: 31,517
mister.magoo (2/22/2013)

What bothered me was that feeling that "someone should have thought of this ages ago, shouldn't they"?
Always makes me nervous when that is the case...


Damn. I've been bitten by and fixed the collation problem using a binary collation in other code often enough that I can't think of a good reason why I forgot it for this over the years. Even Paul has reminded me on occasion.

I'll post a correction to the code in the article and in the attachments when I get a chance. Thank you very miuch for the help. That's why I call this the "community function". Lots of good folks have had a hand in it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1467923
Posted Wednesday, June 26, 2013 7:55 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:58 AM
Points: 36,995, Visits: 31,517
the sqlist (3/18/2013)
Did I miss something? I didn't have the patience to go over all 54 pages of replies. Initially when the code for the cteTally was changed to be faster it used a

SELECT 0 UNION ALL
...

However in the final function it unions with 1 instead of 0. I tried both versions in the final function and they bot work actually.



The change occurred during a couple of those 54 pages of discussion. A couple of folks found another enhancement to make it run even faster than I did (compared to the old function). Then I added a note to the beginning of the article to let folks know where the newer code could be found. Unfortunately, people in a hurry weren't reading the update so I updated the code in the article. I didn't go back to correct the rest of the article, though.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1467925
Posted Wednesday, June 26, 2013 8:04 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 9:58 AM
Points: 36,995, Visits: 31,517
bboyd019 (3/13/2013)
Is there a reason that you mention subtracting the length of the final string from 8000 like so
ISNULL(NULLIF(CHARINDEX(@pDelimiter, @pString, s.N1), 0), 8000) - s.N1

and then in the final code you have
 ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

I just wanted to make sure I'm understanding the code correctly. It looks like it runs the same either way...


I believe that's also due to the same reason I mentioned in the post just above this one.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1467928
Posted Tuesday, November 5, 2013 10:18 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: 2 days ago @ 7:30 PM
Points: 562, Visits: 2,618
Forgive me if the code to do this has already been posted but I had a requirement for a greater-than-8k splitter. I made the following changes:

1) Changed @pString to varchar(max)
2) Added more rows to the tally table (E8(N))
3) Changed
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, 8000)

To
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, DATALENGTH(@pString))

This is the finished product with my comments/changes quoted like this: /** ... **/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER FUNCTION [dbo].[DelimitedSplitVcMax]
--===== Define I/O parameters
(@pString VARCHAR(/**8000**/max), @pDelimiter CHAR(1))
--Sorry
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 1,000,000...
-- enough to cover VARCHAR(<a lot>)
WITH E1(N) AS (
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), --10E+1 or 10 rows
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows

/** NEEDED MORE ROWS, ADDED E8; **/
E8(N) AS (SELECT 1 FROM E4 a, E4 b), --10E+8 or 100,000,000 rows max
cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
-- for both a performance gain and prevention of accidental "overruns"
SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM /**E4**/ E8
),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT 1 UNION ALL
SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
SELECT s.N1,
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, /**8000**/ /**Addaded: **/ DATALENGTH(@pString))
FROM cteStart s
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
Item = SUBSTRING(@pString, l.N1, l.L1)
FROM cteLen l;

This seems correct and produces the right result. It runs twice as slow (which is expected and acceptable [since it's still much, much faster than what I am replacing]). Other than the BIN COLLATION changes that MM mentioned a few pages back - does the code above look good?

Second Question:
I have a requirement where I need to split a value on a table that may or may not have columns with 8k characters. If there aren't rows with >8K characters I want it to use the 8K version, otherwise it would require the varchar(max) version (dbo.DelimitedSplitVcMax). Is this a good approach?

--DDL
CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);
GO

IF NOT EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)
select s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s
ELSE
select s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s



-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1511554
Posted Tuesday, November 5, 2013 10:34 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Sunday, July 27, 2014 3:54 PM
Points: 393, Visits: 811
Alan.B (11/5/2013)
Forgive me if the code to do this has already been posted but I had a requirement for a greater-than-8k splitter. I made the following changes:

1) Changed @pString to varchar(max)
2) Added more rows to the tally table (E8(N))
3) Changed
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, 8000)


This seems correct and produces the right result. It runs twice as slow (which is expected and acceptable [since it's still much, much faster than what I am replacing]). Other than the BIN COLLATION changes that MM mentioned a few pages back - does the code above look good?

Second Question:
I have a requirement where I need to split a value on a table that may or may not have columns with 8k characters. If there aren't rows with >8K characters I want it to use the 8K version, otherwise it would require the varchar(max) version (dbo.DelimitedSplitVcMax). Is this a good approach?

--DDL
CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);
GO

IF NOT EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)
select s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s
ELSE
select s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s



Those are mostly changes I made and have been happy with it (like you said, performance wasn't an issue as it's not used in a OLTP environment).

/*
See sometime for improvements to this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
Jeff Moden 2011-05-02
*/
CREATE FUNCTION udf_StrList2Table (
@List NVARCHAR(MAX) = N'',
@Delimiter NCHAR(1) = N','
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
/*
Following inline delimited spliter written by Jeff Moden.
http://www.sqlservercentral.com/articles/Tally+Table/72993/
*/
WITH E1(N) AS
( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

E8(N) AS
(
--==== Limits the number of rows right up front for both a performance gain and prevention of accidental "overruns"
SELECT TOP (DATALENGTH(ISNULL(@List,1))) 1 FROM E4 a, E4 b -- 100,000,000
),
cteTally(N) AS
( --==== This provides the "zero base"
SELECT 0 UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8
),
cteStart(N1) AS
( --==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ROW_NUMBER() OVER(ORDER BY s.N1) as ListPos,
SUBSTRING(@List,s.N1,ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,DATALENGTH(ISNULL(@List,1)))) as Value
FROM cteStart s
GO
GRANT SELECT ON [dbo].[udf_StrList2Table] TO [public]

--Example: SELECT * FROM udf_StrList2Table('First,Second,Third,Fourth,Fifth,Sixth',',')
--Example: SELECT * FROM udf_StrList2Table('First' + CHAR(9) + 'Second'+ CHAR(9) + 'Third',CHAR(9))

With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.


/* Anything is possible but is it worth it? */
Post #1511560
Posted Tuesday, November 5, 2013 11:44 AM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: 2 days ago @ 7:30 PM
Points: 562, Visits: 2,618
Thanks!

With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.


That last question was just me thinking out loud... I'll actually be implementing the 8K version and will handle the one scenario where I need to handle the longer strings a little differently. Great idea with the calculated column. Thanks!


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1511579
Posted Tuesday, November 5, 2013 2:31 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, August 29, 2014 3:16 PM
Points: 355, Visits: 1,881
Alan.B (11/5/2013)
With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.


That last question was just me thinking out loud... I'll actually be implementing the 8K version and will handle the one scenario where I need to handle the longer strings a little differently. Great idea with the calculated column. Thanks!


Hi there. Well, as long as we are talking about "reasonableness" , why not just use a SQLCLR split function? It has been noted in this forum, as well as in the article, that splitting strings is more efficient and more scalable in .Net / CLR. Using SQLCLR you would not have to worry about <= 8k vs > 8k in terms of either the delimited string or its elements, nor about VARCHAR vs NVARCHAR, nor about adding schema (computed column and index) to determine which of the two functions to call (and would you want to replicate that field and index to other tables in the future if you need to split values in additional places?).

You can find numerous examples of SQLCLR code for a split function. I believe Jeff attached one from Paul White somewhere in this forum (I don't see it attached to the article), or you can use your preferred search engine to find any one of 1000 blogs on it (I believe one from Adam Machanic shows up towards the top of the list), or you can just download SQL# and have a split function within minutes without doing any coding .

Take care,
Solomon...






SQL# - http://www.SQLsharp.com/
Post #1511642
Posted Tuesday, November 5, 2013 5:19 PM


Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: 2 days ago @ 7:30 PM
Points: 562, Visits: 2,618
Solomon Rutzky (11/5/2013)
Alan.B (11/5/2013)
With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.


That last question was just me thinking out loud... I'll actually be implementing the 8K version and will handle the one scenario where I need to handle the longer strings a little differently. Great idea with the calculated column. Thanks!


Hi there. Well, as long as we are talking about "reasonableness" , why not just use a SQLCLR split function? It has been noted in this forum, as well as in the article, that splitting strings is more efficient and more scalable in .Net / CLR. Using SQLCLR you would not have to worry about <= 8k vs > 8k in terms of either the delimited string or its elements, nor about VARCHAR vs NVARCHAR, nor about adding schema (computed column and index) to determine which of the two functions to call (and would you want to replicate that field and index to other tables in the future if you need to split values in additional places?).

You can find numerous examples of SQLCLR code for a split function. I believe Jeff attached one from Paul White somewhere in this forum (I don't see it attached to the article), or you can use your preferred search engine to find any one of 1000 blogs on it (I believe one from Adam Machanic shows up towards the top of the list), or you can just download SQL# and have a split function within minutes without doing any coding .

Take care,
Solomon...


Thank Solomon.

I was actually replacing a CLR . The splitter that I was using is the splitter that comes with Master Data Services (mdq.split) out of the [Microsoft.MasterDataServices.DataQuality] assembly. Our CLR integration may be getting shut off and I was being proactive. It's not the fastest CLR splitter I've ever used (though it is faster than any T-SQL splitter I have ever used*) but it's the one that Microsoft Ships with Master Data Services/Data Quality Services. No new DLL to compile, No new assembly to create, it's been well tested, know bugs have been worked out...



* All that said, I never did a comparison test between mdq.split and delimitedsplit8k.

Using these two variations of mdq.split (& using NVARCHAR because it's a CLR )...
CREATE FUNCTION [mdq].[Split](@input [nvarchar](4000), @separators [nvarchar](10), @removeEmpty [bit], @tokenPattern [nvarchar](4000), @mask [tinyint])
RETURNS TABLE (
[Sequence] [int] NULL,
[Token] [nvarchar](4000) NULL,
[IsValid] [bit] NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [Microsoft.MasterDataServices.DataQuality].[Microsoft.MasterDataServices.DataQuality.SqlClr].[SplitWithCheck]
GO

CREATE FUNCTION [mdq].[SplitVCMax](@input [nvarchar](max), @separators [nvarchar](10), @removeEmpty [bit], @tokenPattern [nvarchar](4000), @mask [tinyint])
RETURNS TABLE (
[Sequence] [int] NULL,
[Token] [nvarchar](max) NULL,
[IsValid] [bit] NULL
) WITH EXECUTE AS CALLER
AS
EXTERNAL NAME [Microsoft.MasterDataServices.DataQuality].[Microsoft.MasterDataServices.DataQuality.SqlClr].[SplitWithCheck]

...and my varchar(max) version of Jeff's splitter (posted earlier)

I used this code for testing (which is similar to what I am doing):

IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

DECLARE @n int=5000

SELECT n AS id,
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')+
REPLACE(newid(),'-',',')
AS val
INTO #vals
FROM tally
WHERE n<=@n

ALTER TABLE #vals
ADD CONSTRAINT vals_xxx PRIMARY KEY(id);
GO


SET STATISTICS TIME OFF;
SET NOCOUNT ON;
SET STATISTICS TIME ON;

select v.id, s.Item
FROM #vals v
CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s

select v.id, s.Item
FROM #vals v
CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s

SELECT v.id, s.Token
FROM #vals v
CROSS APPLY mdq.split(v.val,',',0,NULL,0) s;

SELECT v.id, s.Token
FROM #vals v
CROSS APPLY mdq.splitVcMax(v.val,',',0,NULL,0) s;

SET STATISTICS IO OFF;
GO

I got these results:
 --jeff8k
SQL Server Execution Times:
CPU time = 1404 ms, elapsed time = 3713 ms.
--vcmax version
SQL Server Execution Times:
CPU time = 6412 ms, elapsed time = 7768 ms.
--mdq original
SQL Server Execution Times:
CPU time = 858 ms, elapsed time = 5777 ms.
--mdq vcmax
SQL Server Execution Times:
CPU time = 6303 ms, elapsed time = 9011 ms.


Its worth noting that these results are adding the Latin1_General_BIN collation that Mr. Magoo demonstrated a few pages back. That said, Jeff's splitter is faster than the CLR that ships with Microsoft MDS/DQS.

P.S. Here's a link to information about on mdq.split in case anyone was curious mdq.Split (Transact-SQL) I included a link in case anyone googled, "mdq.split"; it does not no longer comes up in google or yahoo (or Bing ) searches...


-- Alan Burstein



Read this article for best practices on asking questions.
Need to split a string? Try this (Jeff Moden)
Need a pattern-based string spitter? Try this (Dwain Camps)
My blog
Post #1511673
Posted Tuesday, November 5, 2013 5:22 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 7:10 PM
Points: 3,631, Visits: 5,279
Gatekeeper (11/5/2013)
Alan.B (11/5/2013)
Forgive me if the code to do this has already been posted but I had a requirement for a greater-than-8k splitter. I made the following changes:

1) Changed @pString to varchar(max)
2) Added more rows to the tally table (E8(N))
3) Changed
ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1, 8000)


This seems correct and produces the right result. It runs twice as slow (which is expected and acceptable [since it's still much, much faster than what I am replacing]). Other than the BIN COLLATION changes that MM mentioned a few pages back - does the code above look good?

Second Question:
I have a requirement where I need to split a value on a table that may or may not have columns with 8k characters. If there aren't rows with >8K characters I want it to use the 8K version, otherwise it would require the varchar(max) version (dbo.DelimitedSplitVcMax). Is this a good approach?

--DDL
CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);
GO

IF NOT EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)
select s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s
ELSE
select s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s



Those are mostly changes I made and have been happy with it (like you said, performance wasn't an issue as it's not used in a OLTP environment).

/*
See sometime for improvements to this: http://www.sqlservercentral.com/articles/Tally+Table/72993/
Jeff Moden 2011-05-02
*/
CREATE FUNCTION udf_StrList2Table (
@List NVARCHAR(MAX) = N'',
@Delimiter NCHAR(1) = N','
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
/*
Following inline delimited spliter written by Jeff Moden.
http://www.sqlservercentral.com/articles/Tally+Table/72993/
*/
WITH E1(N) AS
( --=== Create Ten 1's
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 --10
),
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000

E8(N) AS
(
--==== Limits the number of rows right up front for both a performance gain and prevention of accidental "overruns"
SELECT TOP (DATALENGTH(ISNULL(@List,1))) 1 FROM E4 a, E4 b -- 100,000,000
),
cteTally(N) AS
( --==== This provides the "zero base"
SELECT 0 UNION ALL
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E8
),
cteStart(N1) AS
( --==== This returns N+1 (starting position of each "element" just once for each delimiter)
SELECT t.N+1 FROM cteTally t WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0)
)
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
SELECT
ROW_NUMBER() OVER(ORDER BY s.N1) as ListPos,
SUBSTRING(@List,s.N1,ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,DATALENGTH(ISNULL(@List,1)))) as Value
FROM cteStart s
GO
GRANT SELECT ON [dbo].[udf_StrList2Table] TO [public]

--Example: SELECT * FROM udf_StrList2Table('First,Second,Third,Fourth,Fifth,Sixth',',')
--Example: SELECT * FROM udf_StrList2Table('First' + CHAR(9) + 'Second'+ CHAR(9) + 'Third',CHAR(9))

With your question about reasonableness, it's up to you. If performance wasn't an issue, why maintain two different functions? If it does matter, you can create a calculated column on DATALENGTH and a filtered index on that column > 8000 so that the EXISTS can be instant.


Alan - You may want to also consider something like this:

--DDL
CREATE TABLE dbo.tbl (id int primary key, val varchar(max) not null);
GO

SELECT s.Item
INTO #Temp
FROM tbl v
CROSS APPLY dbo.DelimitedSplit8K(v.val,',') s
WHERE DATALENGTH(val) <= 8000;

--IF EXISTS(SELECT * FROM tbl WHERE DATALENGTH(val)>=8000)
INSERT INTO #Temp
SELECT s.Item
FROM tbl v
CROSS APPLY dbo.DelimitedSplitVcMax(v.val,',') s
WHERE DATALENGTH(val) > 8000;

SELECT Item
FROM #Temp;


Which would apply the correct splitter to the task. Or use Solomen's suggestion.

Edit: Using IF EXISTS (as commented out) would be optional.



My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1511674
« Prev Topic | Next Topic »

Add to briefcase «««5455565758»»»

Permissions Expand / Collapse