Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

String tokenizing / splitting Expand / Collapse
Author
Message
Posted Thursday, February 24, 2011 8:56 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, February 26, 2012 3:26 AM
Points: 1, Visits: 30
Comments posted to this topic are about the item String tokenizing / splitting
Post #1069381
Posted Friday, February 25, 2011 12:36 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, March 10, 2012 8:06 PM
Points: 15, Visits: 51
I am fairly certain that CTE will be faster...


/*sql server 2008 CTE split and join fun*/
/*split begins*/
if object_id(N'tempdb..#split') is not null drop table #split;
if object_id(N'tempdb..#joined') is not null drop table #joined;
declare @fun varchar(64) = 'The quick brown fox jumped over the lazy dogs!';
declare @delim char(1) = ' ';
select @fun as [Fun];
with split(i, token, remainder) as
(select 1
, left(@fun,charindex(@delim,@fun)-1)
, LTRIM(right(@fun,len(@fun)-CHARINDEX(@delim,@fun)))
union all
select i + 1
,case when charindex(@delim,remainder) > 0 then
left(remainder,charindex(@delim,remainder)-1)
else remainder end as token
,LTRIM(right(remainder,len(remainder)-CHARINDEX(' ',remainder))) as remainder
from split
where charindex(@delim,remainder) >= 0 and token != remainder
)
select i, token, remainder
into #split
from split;
select * from #split;

/*join begins*/
with joined (i, newfun, token) as (
select i, convert(varchar(max),token), token
from #split where i = 1
union all
select s.i, j.newfun + @delim + s.token, s.token
from joined j
inner join
#split s
on s.i = j.i + 1
)
select *
into #joined
from joined;
select * from #joined;

Post #1069426
Posted Friday, February 25, 2011 1:09 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, June 15, 2012 12:48 AM
Points: 176, Visits: 306
Another way to do it is by using the Tally table. Check out Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/62867/.

Here's some example code (how i use it on our systems):

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udfTransformStringToColumn] (
@String VARCHAR(MAX),
@Delimiter VARCHAR(1)
)
RETURNS @List TABLE (ValueColumn VARCHAR(255) NULL)
AS
BEGIN

IF LEFT(@String, 1) <> @Delimiter SET @String = @Delimiter + @String
IF RIGHT(@String, 1) <> @Delimiter SET @String = @String + @Delimiter

;WITH
cteTally AS
(
SELECT TOP (LEN(@String))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
) INSERT INTO @List (ValueColumn)
SELECT SUBSTRING(@String,N+1,CHARINDEX(@Delimiter,@String,N+1)-N-1) AS Value
FROM cteTally
WHERE N < LEN(@String)
AND SUBSTRING(@String,N,1) = @Delimiter
RETURN

END

GO

Post #1069435
Posted Friday, February 25, 2011 1:20 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, March 4, 2014 8:50 AM
Points: 13, Visits: 61
I find the below faster on 2005.


CREATE FUNCTION [Strings].[fnStringSplit_2005_Distinct_CHAR]
(
@SourceString VARCHAR(MAX)
)
/*======================================================================================
'P System : Multiple
'P Subsystem : Common Functions
'P Script : fnSMS_StringSplit_2005
'P Creation date : 15/10/2010
'P
'P Description : Splits a Comma Delimited String Into a Table. Join as a Table
'P
'P SELECT * FROM Strings.fnStringSplit_2005_Distinct_CHAR('9,8,7,6,5,4,3,2,1,')
'P
'P Parameters ----------------------------------------------------------------------
'P Inputs : @SourceString - Comma delimited string
'P Outputs : table variable
'P====================================================================================*/

RETURNS @Values TABLE
(
--POSITION INT IDENTITY,
VALUE VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS
PRIMARY KEY ([VALUE])
)

AS
BEGIN
DECLARE @vchString VARCHAR(MAX)
DECLARE @xmlString XML

BEGIN

SET @vchString = @SourceString

IF RIGHT(@vchString,1) = ','
SET @vchString = LEFT(@vchString, LEN(@vchString)-1)

/*------------------------------------------
| Convert the string to xml
'-----------------------------------------*/

SET @xmlString = CAST('<i>' + REPLACE(@vchString, ',', '</i><i>') + '</i>' AS XML)

/*------------------------------------------
| Read xml into a table variable
'-----------------------------------------*/

INSERT INTO @Values(VALUE)
SELECT DISTINCT x.i.value('.', 'VARCHAR(255)') AS Item
FROM @xmlString.nodes('//i') x(i)
END

RETURN
END

/*---------------------------------------------------------------------------------------
|--------------------------- End of common function -----------------------------|
|--------------------------------------------------------------------------------------*/


Cheap toilet paper is a false economy, beware!
Post #1069440
Posted Friday, February 25, 2011 2:54 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 7, 2014 7:30 AM
Points: 5, Visits: 25
Small Code for 2008 SQL. )

CREATE FUNCTION dbo.fn_Split 
(
@InputString VARCHAR(MAX),
@Delimiter VARCHAR(MAX)
)
RETURNS @Data TABLE (
Data VARCHAR(MAX)
)
AS
BEGIN
DECLARE @DelimitierLen BIGINT = LEN(@Delimiter)
DECLARE @DelimiterCharIndex BIGINT = CHARINDEX(@Delimiter,@InputString)

WHILE (@DelimiterCharIndex > 0)
BEGIN
INSERT INTO @Data VALUES (SUBSTRING(@InputString, 1, @DelimiterCharIndex - 1))
SET @InputString = SUBSTRING(@InputString, @DelimiterCharIndex + @DelimitierLen, LEN(@InputString))
SET @DelimiterCharIndex = CHARINDEX(@Delimiter, @InputString)
END

INSERT INTO @Data VALUES (@InputString)

RETURN
END

Post #1069474
Posted Friday, February 25, 2011 3:14 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, May 7, 2014 7:30 AM
Points: 5, Visits: 25
Fast, but not good. (

DECLARE @inputString varchar(MAX) = '1<i>;2;4;5;'

Result of exec is 'XML parsing: line 1, character 97257, unexpected end of input'.
You need check input string.
Post #1069482
Posted Friday, February 25, 2011 1:46 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 9, 2014 2:14 PM
Points: 2, Visits: 34
CREATE FUNCTION [dbo].[Split](@data VARCHAR(MAX), @delimiter VARCHAR(20))
RETURNS @t TABLE (Element VARCHAR(MAX))
AS
BEGIN
DECLARE @textXML XML

SET @textXML = CAST('<d>' + REPLACE(REPLACE(REPLACE(@data,'&','~~amper~~'),'<','~~lt~~'), @delimiter, '</d><d>') + '</d>' AS XML)

INSERT INTO @t(element)
SELECT REPLACE(REPLACE(T.split.value('.', 'varchar(max)'),'~~amper~~','&'), '~~lt~~','<') AS data
FROM @textXML.nodes('/d') T(split)

RETURN
END

Post #1069840
Posted Friday, February 25, 2011 5:01 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, January 1, 2013 3:03 PM
Points: 317, Visits: 1,018
dimitri.decoene-1027745 (2/25/2011)
Another way to do it is by using the Tally table. Check out Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/62867/.

Here's some example code (how i use it on our systems):

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udfTransformStringToColumn] (
@String VARCHAR(MAX),
@Delimiter VARCHAR(1)
)
RETURNS @List TABLE (ValueColumn VARCHAR(255) NULL)
AS
BEGIN

IF LEFT(@String, 1) <> @Delimiter SET @String = @Delimiter + @String
IF RIGHT(@String, 1) <> @Delimiter SET @String = @String + @Delimiter

;WITH
cteTally AS
(
SELECT TOP (LEN(@String))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
) INSERT INTO @List (ValueColumn)
SELECT SUBSTRING(@String,N+1,CHARINDEX(@Delimiter,@String,N+1)-N-1) AS Value
FROM cteTally
WHERE N < LEN(@String)
AND SUBSTRING(@String,N,1) = @Delimiter
RETURN

END

GO




Interesting function. However, you may want to always append a delimiter to the string because the function would not return the correct number of beginning empty strings. Change

IF LEFT(@String, 1) <> @Delimiter SET @String = @Delimiter + @String

TO

SET @String = @Delimiter + @String



SELECT *
FROM [dbo].[udfTransformStringToColumn]( ';;;Token1;Token2;Token3;Token4;Token5;;', ';' )



The beginning ";;;" of the string should return 3 blank rows, however, the function returns 2.



Post #1069943
Posted Friday, February 25, 2011 7:31 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, March 10, 2012 8:06 PM
Points: 15, Visits: 51
I ran the XML, CTE, and the SUBSTRING functions through and checked the estimated to actual execution plans. The estimated shows the XML_Reader is, by far, the most costly (so costly it can't compare). The Substring shows to be twice as expensive as the CTE, but CTE only works (naturally) up to 100 tokens. The actual came out as equal, so we'd have to design something to check the execution at scale... which eliminates the CTE version unless the test is designed using parallel effort rather sheer token volume.

You could design a SUBSTRING/CTE hybrid where CTE is executed with a try/catch, then cut to the SUBSTRING if it fails by volume... but it might be simply smarter to use the version that fits your use case the best.

The SUBSTRING version has a bug where if the delimiter is a space or any phrase with a trailing space, it trims off that space (or ignores it) and comes up one character short. For VARCHAR, you can fix this by using the DATALENGTH() function instead of LEN() and for NVARCHAR, use DATALENGTH(), then divide by two.

I also found a bug in the XML version that used a clustered index on the returned table. That will change the token order and require unique tokens. You can remove the index to avoid natural reordering by the indexer and errors raised by duplicate tokens. Avoiding a tablescan here may or may not make sense for your use case.

I also forgot to mention the TALLY version. It uses CTE, too, and it was slightly more expensive than the SUBSTRING version so I didn't bother comparisons past that point because it had the worst of both CTE and SUBSTRING versions.

One more bug: the XML version does not handle special characters such as the ampersand(&). The use of a CDATA declaration may help but I did not try it.
Post #1069971
Posted Friday, February 25, 2011 7:33 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, March 10, 2012 8:06 PM
Points: 15, Visits: 51
CTE version, made more like the SUBSTRING version:
CREATE FUNCTION [dbo].[fnStringSplit_CTE] 
(
@SourceString VARCHAR(MAX),
@Delim CHAR(1)
)

RETURNS @Values TABLE
(
VALUE VARCHAR(MAX)
)

AS
BEGIN
with split(i, token, remainder) as
(select 1
, left(@SourceString,charindex(@delim,@SourceString)-1)
, LTRIM(right(@SourceString,len(@SourceString)-CHARINDEX(@delim,@SourceString)))
union all
select i + 1
,case when charindex(@delim,remainder) > 0 then
left(remainder,charindex(@delim,remainder)-1)
else remainder end as token
,LTRIM(right(remainder,len(remainder)-CHARINDEX(@Delim,remainder))) as remainder
from split
where charindex(@delim,remainder) >= 0 and token != remainder
)
insert into @Values
Select token
from split

RETURN
END



GO

Post #1069972
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse