Parse character string into multiple fields

  • karen.ferrara

    Say Hey Kid

    Points: 701

    I need to parse a VARCHAR(255) field into four fields, three VARCHAR(5) and one VARCHAR(255). The VARCHAR field to be parsed is a variable length for example:

    122

    667|6

    71|77|120

    71|77|219|427|122|670|120|10

    The characters before the first | go into field 1

    The characters after the first | but before the second | go into field 2

    The characters after the second | but before the third | go into field three

    The remaining characters go into field 4.

    I created this custom function but on a large data set of over a million records it runs VERY slow. Any suggestions? I have to use something in a view that is returning about 25 fields. Is there a quicker way to parse this data out? Any help is appreciated.

    CREATE FUNCTION [dbo].[fn_strTextparse]

    (

    @String as varchar(255),

    @position tinyint

    )

    RETURNS varchar(255)

    As

    begin

    DECLARE @strReturn VARCHAR(255)

    declare @Occurance INT

    declare @tmpcrp table (

    recordnumber tinyint not null identity(1,1),

    strText varchar(255) null)

    insert into @tmpcrp

    select value from string_split(@string,'|')

    IF @position < 4

    SELECT @strReturn = LEFT(strText, 5) from @tmpcrp where recordnumber = @position

    ELSE

    BEGIN

    SELECT @strReturn = STRING_AGG(strText, '|') FROM @tmpcrp WHERE recordnumber > 3

    END

    RETURN @strReturn

    end

    GO

  • MVDBA (Mike Vessey)

    SSC-Insane

    Points: 21247

    this is always going to be slow, if you have pipe delimited fields then you're not even in 1st normal form (remove repeating data groups)perhaps put that data into a key value pair table and strip it out of your table. That function will never scale.

    if this is incoming data then talk to the supplier

    seriously functions to do text manipulation in SQL will get you in trouble.  fix the source data

    MVDBA

  • DesNorton

    SSC-Insane

    Points: 22905

    If you are stuck with the delimited strings, I would start bu using one of the locally produced high-speed string splitters

    /****************************************************************************************************************
    *** SOURCE: http://www.sqlservercentral.com/articles/SQL+Server+2012/106903/
    *** Eirikur Eiriksson: Reaping the benefits of the Window functions in T-SQL
    ****************************************************************************************************************/

    CREATE FUNCTION [dbo].[DelimitedSplit8K_LEAD]
    --===== Define I/O parameters
    (@pString VARCHAR(8000), @pDelimiter CHAR(1))
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
    --===== "Inline" CTE Driven "Tally Table” produces values from 0 up to 10,000...
    -- enough to cover VARCHAR(8000)
    WITH E1(N) AS (
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
    ), --10E+1 or 10 rows
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front
    -- for both a performance gain and prevention of accidental "overruns"
    SELECT 0 UNION ALL
    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    ),
    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
    SELECT t.N+1
    FROM cteTally t
    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)
    )
    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
    FROM cteStart s;
    GO

    Now, if you are always going to get 1 item at a time

    CREATE FUNCTION dbo.fn_strTextparse ( @String AS varchar(255), @position tinyint )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
    SELECT strReturn = STUFF((SELECT '|' + dskl.Item
    FROM dbo.DelimitedSplit8K_LEAD(@String, '|') AS dskl
    WHERE dskl.ItemNumber = CASE
    WHEN @position < 4 THEN @position
    WHEN dskl.ItemNumber > 3 THEN dskl.ItemNumber
    END
    ORDER BY dskl.ItemNumber
    FOR XML PATH('') )
    , 1, 1, '');
    GO

    This can be used as follows

    -- Single string to split
    DECLARE @String varchar(255) = '71|77|219|427|122|670|120|10';

    SELECT str1 = strReturn FROM dbo.fn_strTextparse(@String, 1);
    SELECT str2 = strReturn FROM dbo.fn_strTextparse(@String, 2);
    SELECT str3 = strReturn FROM dbo.fn_strTextparse(@String, 3);
    SELECT str4 = strReturn FROM dbo.fn_strTextparse(@String, 4);
    GO

    -- Table of strings to split
    CREATE TABLE #TestData (strInput varchar(255));

    INSERT INTO #TestData ( strInput )
    VALUES ( '122' ), ( '667|6' ), ( '71|77|120' ), ( '71|77|219|427|122|670|120|10' );

    SELECT td.strInput
    , s1 = f1.strReturn
    , s2 = f2.strReturn
    , s3 = f3.strReturn
    , s4 = f4.strReturn
    FROM #TestData AS td
    OUTER APPLY dbo.fn_strTextparse(td.strInput, 1) AS f1
    OUTER APPLY dbo.fn_strTextparse(td.strInput, 2) AS f2
    OUTER APPLY dbo.fn_strTextparse(td.strInput, 3) AS f3
    OUTER APPLY dbo.fn_strTextparse(td.strInput, 4) AS f4

    Or, you can get all 4 parts of the string at once

    CREATE FUNCTION dbo.fn_strTextparseMulti ( @String AS varchar(255) )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
    SELECT str1 = MAX(src.str1)
    , str2 = MAX(src.str2)
    , str3 = MAX(src.str3)
    , str4 = MAX(src.str4)
    FROM (
    SELECT str1 = MAX(CASE WHEN dskl.ItemNumber = 1 THEN dskl.Item END)
    , str2 = MAX(CASE WHEN dskl.ItemNumber = 2 THEN dskl.Item END)
    , str3 = MAX(CASE WHEN dskl.ItemNumber = 3 THEN dskl.Item END)
    , str4 = NULL
    FROM dbo.DelimitedSplit8K_LEAD(@String, '|') AS dskl
    /***/ UNION ALL /***/
    SELECT str1 = NULL
    , str2 = NULL
    , str3 = NULL
    , str4 = STUFF((SELECT '|' + dskl.Item
    FROM dbo.DelimitedSplit8K_LEAD(@String, '|') AS dskl
    WHERE dskl.ItemNumber > 3
    ORDER BY dskl.ItemNumber
    FOR XML PATH('') )
    , 1, 1, '')
    ) AS src;
    GO

    This can be used as follows

    -- Single string to split
    DECLARE @String varchar(255) = '71|77|219|427|122|670|120|10';

    SELECT str1, str2, str3, str4
    FROM dbo.fn_strTextparseMulti(@String);
    GO

    -- Table of strings to split
    CREATE TABLE #TestData (strInput varchar(255));

    INSERT INTO #TestData ( strInput )
    VALUES ( '122' ), ( '667|6' ), ( '71|77|120' ), ( '71|77|219|427|122|670|120|10' );

    SELECT td.strInput
    , f.str1
    , f.str2
    , f.str3
    , f.str4
    FROM #TestData AS td
    OUTER APPLY dbo.fn_strTextparseMulti(td.strInput) AS f

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply