String Split

  • I want to write query which procures me result like below.

    General rule is if any word starts from number then it should split as word.

    Input

    -----

    Ørstedsvej 7B

    Volvo 25D

    104ND Nokia

    Result

    ------

    Ørstedsvej 7 B

    Volvo 25 D

    104 ND Nokia

    Or the simplest way to do.

  • Hi Aadhar

    I'm not quite sure what you want to get because I see the gap in string result between "ND" and "Nokia".

    But you can separate words and numbers as shown in my example.

    --Temp table for test

    CREATE TABLE #TempT (Col1 nvarchar(25));

    INSERT INTO #TempT

    SELECT 'Ørstedsvej7B' UNION ALL

    SELECT 'Volvo25D' UNION ALL

    SELECT '104NDNokia';

    CREATE FUNCTION [dbo].[cut_str]

    (@p_str nvarchar(100))

    RETURNS nchar(198)

    AS

    BEGIN

    DECLARE @p_counter int

    DECLARE @p_len int

    DECLARE @p_curr_char nvarchar(1)

    DECLARE @p_next_char nvarchar(1)

    DECLARE @p_char nvarchar(199)

    SET @p_len = len(@p_str)

    SET @p_counter = 1

    SET @p_curr_char = ''

    SET @p_next_char = ''

    SET @p_char = ''

    -- loop

    WHILE @p_counter <= @p_len

    BEGIN

    SET @p_curr_char = substring(@p_str, @p_counter, 1)

    SET @p_next_char = substring(@p_str, @p_counter+1, 1)

    if (@p_curr_char like '%[0-9]%' AND @p_next_char like '%[^0-9]%') OR (@p_curr_char like '%[^0-9]%' AND @p_next_char like '%[0-9]%')

    BEGIN

    SET @p_char = @p_char + @p_curr_char + ' '

    END

    ELSE

    BEGIN

    SET @p_char = @p_char + @p_curr_char

    END

    SET @p_counter = @p_counter + 1

    END

    RETURN @p_char

    END

    SELECT [dbo].[cut_str](Col1) from #TempT

    This is only example, will be slow (function reduces performance) in prod environment.

    Br.

    Mike

  • I might be overcomplicating something but here's another way to do it.

    CREATE TABLE #Input

    (

    String varchar(15)

    )

    INSERT INTO #Input

    SELECT *

    FROM (VALUES('Ørstedsvej 7B'),('Volvo 25D'),('104ND Nokia'))x(x);

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (

    SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) - 1 FROM E4

    ),

    Data AS (

    SELECT *

    FROM #Input i

    JOIN cteTally t ON t.N <= LEN(i.String)

    )

    SELECT STUFF(a.String, a.N, 1, SUBSTRING( a.String, a.N, 1) + ' ')

    FROM Data a

    JOIN Data b ON a.N = b.N - 1 AND a.String = b.String

    WHERE SUBSTRING( a.String, a.N, 1) LIKE '[0-9]'

    AND SUBSTRING( b.String, b.N, 1) LIKE '[a-zA-Z]'

    ORDER BY a.String, a.N

    DROP TABLE #Input

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • If you only have a single replacement like in your example then this should do the trick

    SELECT STUFF(val,PATINDEX('%[0-9][a-z]%',val) + 1,0,' ')

    FROM (VALUES

    ('Ørstedsvej 7B')

    ,('Volvo 25D')

    ,('104ND Nokia')

    ) SD(VAL)

    If you have more than one candidate you could use the delimitedsplit8k[/url] to separate words then apply the above to each word and finally put it back together.

    SELECT val, cast(newval as varchar(50)) newval

    FROM (VALUES

    (1, 'Ørstedsvej 7B 123DSF')

    ,(2, '1233DFF Volvo 25D')

    ,(3, '104ND Nokia')

    ) SD(ID,VAL)

    CROSS APPLY (

    SELECT ' ' + LTRIM(STUFF(item,PATINDEX('%[0-9][a-z]%',item) + 1,0,' ')) as [text()]

    FROM dbo.delimitedsplit8k(VAL, ' ')

    ORDER BY ItemNumber

    FOR XML PATH ('')

    ) s (newval)

  • How about using a string pattern splitter?

    WITH SampleData (MyStr) AS (

    SELECT 'Ørstedsvej 7B' UNION ALL

    SELECT 'Volvo 25D' UNION ALL

    SELECT '104ND Nokia'

    )

    SELECT MyStr

    ,(

    SELECT RTRIM(Item) + ' '

    FROM SampleData b

    CROSS APPLY dbo.PatternSplitCM(b.MyStr, '%[0-9]%') c

    WHERE a.MyStr = b.MyStr

    ORDER BY ItemNumber

    FOR XML PATH('')

    )

    FROM SampleData a;

    Refer to the 4th link in my signature articles to get the PatternSplitCM function


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (9/18/2013)


    How about using a string pattern splitter?

    WITH SampleData (MyStr) AS (

    SELECT 'Ørstedsvej 7B' UNION ALL

    SELECT 'Volvo 25D' UNION ALL

    SELECT '104ND Nokia'

    )

    SELECT MyStr

    ,(

    SELECT RTRIM(Item) + ' '

    FROM SampleData b

    CROSS APPLY dbo.PatternSplitCM(b.MyStr, '%[0-9]%') c

    WHERE a.MyStr = b.MyStr

    ORDER BY ItemNumber

    FOR XML PATH('')

    )

    FROM SampleData a;

    Refer to the 4th link in my signature articles to get the PatternSplitCM function

    That is what I was originally going to propose 🙂 but I messed up the pattern ('%[0-9][a-z]%') I was feeding into it and not getting the split I was expecting. Knew I was doing something wrong, but couldn't see it:(

  • mickyT (9/18/2013)


    dwain.c (9/18/2013)


    How about using a string pattern splitter?

    WITH SampleData (MyStr) AS (

    SELECT 'Ørstedsvej 7B' UNION ALL

    SELECT 'Volvo 25D' UNION ALL

    SELECT '104ND Nokia'

    )

    SELECT MyStr

    ,(

    SELECT RTRIM(Item) + ' '

    FROM SampleData b

    CROSS APPLY dbo.PatternSplitCM(b.MyStr, '%[0-9]%') c

    WHERE a.MyStr = b.MyStr

    ORDER BY ItemNumber

    FOR XML PATH('')

    )

    FROM SampleData a;

    Refer to the 4th link in my signature articles to get the PatternSplitCM function

    That is what I was originally going to propose 🙂 but I messed up the pattern ('%[0-9][a-z]%') I was feeding into it and not getting the split I was expecting. Knew I was doing something wrong, but couldn't see it:(

    Oddly enough, the first thing that came to my mind was a STUFF like you did. :w00t:


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (9/18/2013)


    mickyT (9/18/2013)


    dwain.c (9/18/2013)


    How about using a string pattern splitter?

    WITH SampleData (MyStr) AS (

    SELECT 'Ørstedsvej 7B' UNION ALL

    SELECT 'Volvo 25D' UNION ALL

    SELECT '104ND Nokia'

    )

    SELECT MyStr

    ,(

    SELECT RTRIM(Item) + ' '

    FROM SampleData b

    CROSS APPLY dbo.PatternSplitCM(b.MyStr, '%[0-9]%') c

    WHERE a.MyStr = b.MyStr

    ORDER BY ItemNumber

    FOR XML PATH('')

    )

    FROM SampleData a;

    Refer to the 4th link in my signature articles to get the PatternSplitCM function

    That is what I was originally going to propose 🙂 but I messed up the pattern ('%[0-9][a-z]%') I was feeding into it and not getting the split I was expecting. Knew I was doing something wrong, but couldn't see it:(

    Oddly enough, the first thing that came to my mind was a STUFF like you did. :w00t:

    Looking at the spec

    General rule is if any word starts from number then it should split as word.

    and trying the following value 'V8EFI Volvo 25D', I think that your solution fits the purpose better.

    Mine will split it where it probably shouldn't.

  • here's a solution but it only does first occurance - could make case into a function and call multiple times or make it recursive where noted

    WITH SampleData(test) AS (

    SELECT 'rstedsvej 7B' test

    UNION SELECT 'Volvo 25D'

    UNION SELECT 'test'-- no numbers

    UNION SELECT null-- null doesn't have issue

    UNION SELECT '104ND Nokia'

    UNION SELECT 'x 104 ND Nokia' -- already a space

    UNION SELECT 'x 104!ND Nokia' -- non alpha

    UNION SELECT 'se104ND Nokia' -- not seperated cause 104 not start of "word"

    UNION SELECT 'se 104ND Nokia 34asr' -- 2nd + cases not currently separated

    )

    select test

    -- when number found

    -- followed by non-number

    -- and non-number is not a space already

    -- and numbers are preceded by space

    --OR number is first character

    ,CASE WHEN PATINDEX('%[0-9]%',test) >0

    and PATINDEX('%[^0-9]%', SUBSTRING(test, PATINDEX('%[0-9]%',test),LEN(test) ))> 1

    and PATINDEX('%[^0-9]%', SUBSTRING(test, PATINDEX('%[0-9]%',test),LEN(test) )) <>

    PATINDEX('% %', SUBSTRING(test, PATINDEX('%[0-9]%',test),LEN(test) ))

    and (PATINDEX('%[0-9]%',test) = 1

    OR SUBSTRING(test, PATINDEX('%[0-9]%',test)-1,1) = ' ' )

    THEN SUBSTRING(test, 1, PATINDEX('%[0-9]%',test)-1)

    +SUBSTRING(test

    , PATINDEX('%[0-9]%',test)

    , PATINDEX('%[^0-9]%', SUBSTRING(test, PATINDEX('%[0-9]%',test),LEN(test) ))-1)

    + ' '

    + SUBSTRING( SUBSTRING(test, PATINDEX('%[0-9]%',test),LEN(test) )

    , PATINDEX('%[^0-9]%', SUBSTRING(test, PATINDEX('%[0-9]%',test),LEN(test) ))

    , LEN(test))

    --option: make last substring recursive to do split multiple times in string...

    ELSE test

    END

    FROM SampleData

  • Thank u all, I appreciate every one here for the solution.

    Finally, i found my own one.

  • Aadhar Joshi (9/25/2013)


    Thank u all, I appreciate every one here for the solution.

    Finally, i found my own one.

    Please share with us.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply