Split String On Delimiter

Question

Split String On Delimiter

AT-2016

Mr or Mrs. 500

Points: 570
More actions
October 14, 2016 at 4:58 am

#331033

I've a string as follows:
45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc
I am expecting to get it as follows:
String1 - 45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg
String2 - TBL101 | PC | 1.00 | COMP101 | CS | 1.00......... etc
So what I am trying to do is to split the string into two string on 7th delimiter and tried the following but it works only for string1:
DECLARE @STR VARCHAR(1000),
@str1 VARCHAR(1000),
@str2 VARCHAR(1000)
SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'
/****I was actually following a tutorial on doing this but stuck on this statements. Would expect some explanation on this - Starts****/
--Creating a number table
;WITH numcte AS(
SELECT 1 AS rn UNION ALL SELECT rn + 1 FROM numcte WHERE rn <= LEN(@str)),
--Get the position of the "|" charecters
GetDelimitedCharPos AS(
SELECT ROW_NUMBER() OVER(ORDER BY getdate()) seq, rn, delimitedcharpos
FROM numcte
CROSS APPLY(SELECT SUBSTRING(@str, rn, 1)delimitedcharpos) m WHERE delimitedcharpos = '|')
/****I was actually following a tutorial on doing this but stuck on this statements. Would expect some explanation on this - Ends****/
--Applying the formula SUBSTRING(@str, startseq + 1, endseq-startseq + 1)
SELECT TOP 1
(CASE WHEN SUBSTRING(
@STR
,(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)
,(SELECT TOP 1 rn FROM GetDelimitedCharPos WHERE seq = 7 ) -
(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)
) = SUBSTRING(
@STR
,(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)
,(SELECT TOP 1 rn FROM GetDelimitedCharPos WHERE seq = 7 ) -
(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 6)
) THEN SUBSTRING(
@STR
,(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 1)
,(SELECT TOP 1 rn FROM GetDelimitedCharPos WHERE seq = 7 ) -
(SELECT TOP 1 rn + 1 FROM GetDelimitedCharPos WHERE seq = 1)
) ELSE '' END) AS String1
FROM GetDelimitedCharPos
I would expect some ideas from the experts to implement it perfectly. Thanks.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183698 More actions · Answer 1

Here's a simple option using the DelimitedSplit8k that can be found here along with its explanation: http://www.sqlservercentral.com/articles/Tally+Table/72993/

A specific function could be made to split only on the 7th delimiter and avoid having to concatenate the strings again.

DECLARE @STR VARCHAR(1000),

@str1 VARCHAR(1000),

@str2 VARCHAR(1000)

SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

SELECT s.Item, s.ItemNumber, (s.ItemNumber - 1) / 7 AS ItemGroup

INTO #Splitted

FROM dbo.DelimitedSplit8K( @STR, '|')s;

SELECT STUFF(( SELECT '| ' + i.Item

FROM #Splitted i

WHERE i.ItemGroup = s.ItemGroup

ORDER BY i.ItemNumber

FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')

FROM #Splitted s

GROUP BY ItemGroup;

DROP TABLE #Splitted;

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Smendle Hall of Fame Points: 3969 More actions · Answer 2

Luis is a master at showing set based solutions and if you plan on using SQL for any length of time, I would understand why those solutions are preferable over something like this....

DECLARE @STR VARCHAR(1000),

@str1 VARCHAR(1000),

@str2 VARCHAR(1000),

@pos INT,

@counter INT

SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

SET @counter=0

SET @pos=0

WHILE @counter <=7

BEGIN

SET @pos=CHARINDEX('|',@str,@pos+1)

set @counter=@counter+1

END

SET @str1=SUBSTRING(@str,1,@pos)

SET @str2=SUBSTRING(@str,@pos+1,LEN(@str)-@pos)

Print 'str1='+ @str1

Print 'str2='+ @str2

The solution I posted is the epitome of a RBAR type solution.....(looping through data to get to the final solution)

Alan Burstein SSC Guru Points: 61136 More actions · Answer 3

Luis Cazares (10/14/2016)
Here's a simple option using the DelimitedSplit8k that can be found here along with its explanation: http://www.sqlservercentral.com/articles/Tally+Table/72993/
A specific function could be made to split only on the 7th delimiter and avoid having to concatenate the strings again.
DECLARE @STR VARCHAR(1000),
@str1 VARCHAR(1000),
@str2 VARCHAR(1000)
SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'
SELECT s.Item, s.ItemNumber, (s.ItemNumber - 1) / 7 AS ItemGroup
INTO #Splitted
FROM dbo.DelimitedSplit8K( @STR, '|')s;
SELECT STUFF(( SELECT '| ' + i.Item
FROM #Splitted i
WHERE i.ItemGroup = s.ItemGroup
ORDER BY i.ItemNumber
FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')
FROM #Splitted s
GROUP BY ItemGroup;
DROP TABLE #Splitted;

Please forgive me if I'm misunderstanding the requirement and for not coding an example (I'm on vacation without a PC).

I don't think a splitter or concatenation is required here. Why not just use a tally table and row_number() to identify the location of the 7th delimiter: let's call that value X. Then for string1:

SUBSTRING(yourstring, 1, X-1)

For string2:

SUBSTRING(yourstring, X+1, 8000)

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Luis Cazares SSC Guru Points: 183698 More actions · Answer 4

Here are 2 modified versions of the DelimitedSplit8k function. The second one will only work on 2012 and more recent versions.

CREATE FUNCTION [dbo].[DelimitedSplit8KNOccurrence](@pString [varchar](8000), @pDelimiter [char](1), @pOccurrence int)

RETURNS TABLE WITH SCHEMABINDING

AS

RETURN

--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

-- enough to cover NVARCHAR(4000)

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

),

cteStart(N1,G1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT 1, 0 UNION ALL

SELECT t.N+1 , ROW_NUMBER() OVER(ORDER BY N)

FROM cteTally t

WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

),

cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

SELECT s1.N1, ISNULL( s2.N1 - s1.N1 - 1, 8000)

FROM cteStart s1

LEFT JOIN cteStart s2 ON s1.G1/@pOccurrence = s2.G1/@pOccurrence - 1 AND s2.G1 % @pOccurrence = 0

WHERE s1.G1 % @pOccurrence = 0

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

Item = SUBSTRING(@pString, l.N1, l.L1)

FROM cteLen l

;

GO

CREATE FUNCTION [dbo].[DelimitedSplit8KLeadNOccurrence](@pString [varchar](8000), @pDelimiter [char](1), @pOccurrence int)

RETURNS TABLE WITH SCHEMABINDING

AS

RETURN

WITH E1(N) AS (

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

), --10E+1 or 10 rows

E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

-- for both a performance gain and prevention of accidental "overruns"

SELECT 0 UNION ALL

SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

),

cteStart(N1, G1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

SELECT t.N+1, ROW_NUMBER() OVER(ORDER BY N) % @pOccurrence

FROM cteTally t

WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

)

--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))

FROM cteStart s

WHERE G1 = 1;

GO

With those, the final code would be simplified to a single statement.

DECLARE @STR VARCHAR(1000)

SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'

SELECT *

FROM dbo.DelimitedSplit8KNOccurrence( @STR, '|', 7)s;

I'm not able to claim any performance improvements, because I haven't test both approaches for performance. That doesn't mean that there won't be, it only means that you need to test them.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Luis Cazares SSC Guru Points: 183698 More actions · Answer 5

Alan.B (10/14/2016)
Luis Cazares (10/14/2016)
Here's a simple option using the DelimitedSplit8k that can be found here along with its explanation: http://www.sqlservercentral.com/articles/Tally+Table/72993/
A specific function could be made to split only on the 7th delimiter and avoid having to concatenate the strings again.
DECLARE @STR VARCHAR(1000),
@str1 VARCHAR(1000),
@str2 VARCHAR(1000)
SET @STR ='45 | 00055 | 9/30/2016 | Vodafone | Randy Singh | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'
SELECT s.Item, s.ItemNumber, (s.ItemNumber - 1) / 7 AS ItemGroup
INTO #Splitted
FROM dbo.DelimitedSplit8K( @STR, '|')s;
SELECT STUFF(( SELECT '| ' + i.Item
FROM #Splitted i
WHERE i.ItemGroup = s.ItemGroup
ORDER BY i.ItemNumber
FOR XML PATH(''), TYPE).value('./text()[1]', 'varchar(max)'), 1, 1, '')
FROM #Splitted s
GROUP BY ItemGroup;
DROP TABLE #Splitted;
Please forgive me if I'm misunderstanding the requirement and for not coding an example (I'm on vacation without a PC).
I don't think a splitter or concatenation is required here. Why not just use a tally table and row_number() to identify the location of the 7th delimiter: let's call that value X. Then for string1:
SUBSTRING(yourstring, 1, X-1)
For string2:
SUBSTRING(yourstring, X+1, 8000)

That would work when there are 2 values. I'm not sure if that would always be the case.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

ScottPletcher SSC Guru Points: 100956 More actions · Answer 6

Here's a simple, brute-force -- but still no looping -- method, if you prefer that.

SELECT

string AS original_string,

CASE WHEN are_enough_delims_present = 0 THEN string

ELSE LEFT(string, string_delim_7 - 1) END AS string1,

CASE WHEN are_enough_delims_present = 0 THEN ''

ELSE LTRIM(SUBSTRING(string, string_delim_7 + 1, 8000)) END AS string2

FROM (

VALUES('45 | 00055 | 9/30/2016 | Vodafone | Jack Morris | Newyork | Test Msg | TBL101 | PC | 1.00 | COMP101 | CS | 1.00.............. etc'),

('46 | xxx | not | enough | delims')

) AS test_data(string)

CROSS APPLY (

SELECT CAST(CASE WHEN string LIKE '%|%|%|%|%|%|%|%'

THEN 1 ELSE 0 END AS bit) AS are_enough_delims_present

) AS ca1

OUTER APPLY (

SELECT CHARINDEX('|', string) AS string_delim_1