SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


String tokenizing / splitting


String tokenizing / splitting

Author
Message
Irfan.Baig
Irfan.Baig
Grasshopper
Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)Grasshopper (22 reputation)

Group: General Forum Members
Points: 22 Visits: 54
Comments posted to this topic are about the item String tokenizing / splitting
davecason
davecason
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 51
I am fairly certain that CTE will be faster...


/*sql server 2008 CTE split and join fun*/
/*split begins*/
if object_id(N'tempdb..#split') is not null drop table #split;
if object_id(N'tempdb..#joined') is not null drop table #joined;
declare @fun varchar(64) = 'The quick brown fox jumped over the lazy dogs!';
declare @delim char(1) = ' ';
select @fun as [Fun];
with split(i, token, remainder) as
(select 1
, left(@fun,charindex(@delim,@fun)-1)
, LTRIM(right(@fun,len(@fun)-CHARINDEX(@delim,@fun)))
union all
select i + 1
,case when charindex(@delim,remainder) > 0 then
left(remainder,charindex(@delim,remainder)-1)
else remainder end as token
,LTRIM(right(remainder,len(remainder)-CHARINDEX(' ',remainder))) as remainder
from split
where charindex(@delim,remainder) >= 0 and token != remainder
)
select i, token, remainder
into #split
from split;
select * from #split;

/*join begins*/
with joined (i, newfun, token) as (
select i, convert(varchar(max),token), token
from #split where i = 1
union all
select s.i, j.newfun + @delim + s.token, s.token
from joined j
inner join
#split s
on s.i = j.i + 1
)
select *
into #joined
from joined;
select * from #joined;


dimitri.decoene-1027745
dimitri.decoene-1027745
SSC-Enthusiastic
SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)SSC-Enthusiastic (178 reputation)

Group: General Forum Members
Points: 178 Visits: 306
Another way to do it is by using the Tally table. Check out Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/62867/.

Here's some example code (how i use it on our systems):

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udfTransformStringToColumn] (
@String VARCHAR(MAX),
@Delimiter VARCHAR(1)
)
RETURNS @List TABLE (ValueColumn VARCHAR(255) NULL)
AS
BEGIN

IF LEFT(@String, 1) <> @Delimiter SET @String = @Delimiter + @String
IF RIGHT(@String, 1) <> @Delimiter SET @String = @String + @Delimiter

;WITH
cteTally AS
(
SELECT TOP (LEN(@String))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
) INSERT INTO @List (ValueColumn)
SELECT SUBSTRING(@String,N+1,CHARINDEX(@Delimiter,@String,N+1)-N-1) AS Value
FROM cteTally
WHERE N < LEN(@String)
AND SUBSTRING(@String,N,1) = @Delimiter
RETURN

END

GO


Craig Sunderland
Craig Sunderland
Grasshopper
Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)Grasshopper (19 reputation)

Group: General Forum Members
Points: 19 Visits: 61
I find the below faster on 2005.


CREATE FUNCTION [Strings].[fnStringSplit_2005_Distinct_CHAR]
(
@SourceString VARCHAR(MAX)
)
/*======================================================================================
'P System : Multiple
'P Subsystem : Common Functions
'P Script : fnSMS_StringSplit_2005
'P Creation date : 15/10/2010
'P
'P Description : Splits a Comma Delimited String Into a Table. Join as a Table
'P
'P SELECT * FROM Strings.fnStringSplit_2005_Distinct_CHAR('9,8,7,6,5,4,3,2,1,')
'P
'P Parameters ----------------------------------------------------------------------
'P Inputs : @SourceString - Comma delimited string
'P Outputs : table variable
'P====================================================================================*/

RETURNS @Values TABLE
(
--POSITION INT IDENTITY,
VALUE VARCHAR(255) COLLATE SQL_Latin1_General_CP1_CI_AS
PRIMARY KEY ([VALUE])
)

AS
BEGIN
DECLARE @vchString VARCHAR(MAX)
DECLARE @xmlString XML

BEGIN

SET @vchString = @SourceString

IF RIGHT(@vchString,1) = ','
SET @vchString = LEFT(@vchString, LEN(@vchString)-1)

/*------------------------------------------
| Convert the string to xml
'-----------------------------------------*/

SET @xmlString = CAST('<i>' + REPLACE(@vchString, ',', '</i><i>') + '</i>' AS XML)

/*------------------------------------------
| Read xml into a table variable
'-----------------------------------------*/

INSERT INTO @Values(VALUE)
SELECT DISTINCT x.i.value('.', 'VARCHAR(255)') AS Item
FROM @xmlString.nodes('//i') x(i)
END

RETURN
END

/*---------------------------------------------------------------------------------------
|--------------------------- End of common function -----------------------------|
|--------------------------------------------------------------------------------------*/


Cheap toilet paper is a false economy, beware!
AAYakovenko
AAYakovenko
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 34
Small Code for 2008 SQL. )


CREATE FUNCTION dbo.fn_Split
(
@InputString VARCHAR(MAX),
@Delimiter VARCHAR(MAX)
)
RETURNS @Data TABLE (
Data VARCHAR(MAX)
)
AS
BEGIN
DECLARE @DelimitierLen BIGINT = LEN(@Delimiter)
DECLARE @DelimiterCharIndex BIGINT = CHARINDEX(@Delimiter,@InputString)

WHILE (@DelimiterCharIndex > 0)
BEGIN
INSERT INTO @Data VALUES (SUBSTRING(@InputString, 1, @DelimiterCharIndex - 1))
SET @InputString = SUBSTRING(@InputString, @DelimiterCharIndex + @DelimitierLen, LEN(@InputString))
SET @DelimiterCharIndex = CHARINDEX(@Delimiter, @InputString)
END

INSERT INTO @Data VALUES (@InputString)

RETURN
END


AAYakovenko
AAYakovenko
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 34
Fast, but not good. (


DECLARE @inputString varchar(MAX) = '1<i>;2;4;5;'



Result of exec is 'XML parsing: line 1, character 97257, unexpected end of input'.
You need check input string.
gorr-688214
gorr-688214
Forum Newbie
Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)Forum Newbie (9 reputation)

Group: General Forum Members
Points: 9 Visits: 45

CREATE FUNCTION [dbo].[Split](@data VARCHAR(MAX), @delimiter VARCHAR(20))
RETURNS @t TABLE (Element VARCHAR(MAX))
AS
BEGIN
DECLARE @textXML XML

SET @textXML = CAST('<d>' + REPLACE(REPLACE(REPLACE(@data,'&','~~amper~~'),'<','~~lt~~'), @delimiter, '</d><d>') + '</d>' AS XML)

INSERT INTO @t(element)
SELECT REPLACE(REPLACE(T.split.value('.', 'varchar(max)'),'~~amper~~','&'), '~~lt~~','<') AS data
FROM @textXML.nodes('/d') T(split)

RETURN
END


nadabadan
nadabadan
Old Hand
Old Hand (333 reputation)Old Hand (333 reputation)Old Hand (333 reputation)Old Hand (333 reputation)Old Hand (333 reputation)Old Hand (333 reputation)Old Hand (333 reputation)Old Hand (333 reputation)

Group: General Forum Members
Points: 333 Visits: 1018
dimitri.decoene-1027745 (2/25/2011)
Another way to do it is by using the Tally table. Check out Jeff Moden's article http://www.sqlservercentral.com/articles/T-SQL/62867/.

Here's some example code (how i use it on our systems):

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE FUNCTION [dbo].[udfTransformStringToColumn] (
@String VARCHAR(MAX),
@Delimiter VARCHAR(1)
)
RETURNS @List TABLE (ValueColumn VARCHAR(255) NULL)
AS
BEGIN

IF LEFT(@String, 1) <> @Delimiter SET @String = @Delimiter + @String
IF RIGHT(@String, 1) <> @Delimiter SET @String = @String + @Delimiter

;WITH
cteTally AS
(
SELECT TOP (LEN(@String))
ROW_NUMBER() OVER (ORDER BY t1.Object_ID) AS N
FROM Master.sys.All_Columns t1
CROSS JOIN Master.sys.All_Columns t2
) INSERT INTO @List (ValueColumn)
SELECT SUBSTRING(@String,N+1,CHARINDEX(@Delimiter,@String,N+1)-N-1) AS Value
FROM cteTally
WHERE N < LEN(@String)
AND SUBSTRING(@String,N,1) = @Delimiter
RETURN

END

GO




Interesting function. However, you may want to always append a delimiter to the string because the function would not return the correct number of beginning empty strings. Change

IF LEFT(@String, 1) <> @Delimiter SET @String = @Delimiter + @String

TO

SET @String = @Delimiter + @String




SELECT *
FROM [dbo].[udfTransformStringToColumn]( ';;;Token1;Token2;Token3;Token4;Token5;;', ';' )



The beginning ";;;" of the string should return 3 blank rows, however, the function returns 2.
davecason
davecason
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 51
I ran the XML, CTE, and the SUBSTRING functions through and checked the estimated to actual execution plans. The estimated shows the XML_Reader is, by far, the most costly (so costly it can't compare). The Substring shows to be twice as expensive as the CTE, but CTE only works (naturally) up to 100 tokens. The actual came out as equal, so we'd have to design something to check the execution at scale... which eliminates the CTE version unless the test is designed using parallel effort rather sheer token volume.

You could design a SUBSTRING/CTE hybrid where CTE is executed with a try/catch, then cut to the SUBSTRING if it fails by volume... but it might be simply smarter to use the version that fits your use case the best.

The SUBSTRING version has a bug where if the delimiter is a space or any phrase with a trailing space, it trims off that space (or ignores it) and comes up one character short. For VARCHAR, you can fix this by using the DATALENGTH() function instead of LEN() and for NVARCHAR, use DATALENGTH(), then divide by two.

I also found a bug in the XML version that used a clustered index on the returned table. That will change the token order and require unique tokens. You can remove the index to avoid natural reordering by the indexer and errors raised by duplicate tokens. Avoiding a tablescan here may or may not make sense for your use case.

I also forgot to mention the TALLY version. It uses CTE, too, and it was slightly more expensive than the SUBSTRING version so I didn't bother comparisons past that point because it had the worst of both CTE and SUBSTRING versions.

One more bug: the XML version does not handle special characters such as the ampersand(&). The use of a CDATA declaration may help but I did not try it.
davecason
davecason
SSC Rookie
SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)SSC Rookie (37 reputation)

Group: General Forum Members
Points: 37 Visits: 51
CTE version, made more like the SUBSTRING version:

CREATE FUNCTION [dbo].[fnStringSplit_CTE]
(
@SourceString VARCHAR(MAX),
@Delim CHAR(1)
)

RETURNS @Values TABLE
(
VALUE VARCHAR(MAX)
)

AS
BEGIN
with split(i, token, remainder) as
(select 1
, left(@SourceString,charindex(@delim,@SourceString)-1)
, LTRIM(right(@SourceString,len(@SourceString)-CHARINDEX(@delim,@SourceString)))
union all
select i + 1
,case when charindex(@delim,remainder) > 0 then
left(remainder,charindex(@delim,remainder)-1)
else remainder end as token
,LTRIM(right(remainder,len(remainder)-CHARINDEX(@Delim,remainder))) as remainder
from split
where charindex(@delim,remainder) >= 0 and token != remainder
)
insert into @Values
Select token
from split

RETURN
END



GO


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search