Tally OH! An Improved SQL 8K “CSV Splitter” Function

  • aochss (1/16/2013)


    I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause. I fixed this by adding LTRIM and RTRIM to the source string in the function.

    Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.

    For example:

    dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789

    ItemNumber Item

    1 123

    2 467

    3 <-- Empty String

    4 789

    I modified the function to give an option for removing extra spaces after the delimiter (like '123, 345, 567) and then as you did to your version to check for null/blank input as well as null/blank rows in the output. To do that I had to change the function into a multi-statement tvf.

    Due to the additional checking for nulls/blanks and removing blank rows from the returned table I figured efficiency would be seriously compromised. So I tested this revision against the original and it seriously under performs. In the results below the first run of DelimitedSplit8k_NULLS sets the parameter "@bRemoveSpacesAfterDelimiter" to 0 and the 2nd run set it to 1.

    I can see where nulls/blanks could be an issue, but I think it would be better to filter out such rows AFTER applying any CROSS APPLYs using the original function as opposed to trying to do the filtering within the function (though I didn't specifically test that). Maybe someone else can come up with a better way of getting rid of nulls/blanks without compromising the efficiency of DelimitedSplit8K?

    Test Results:

    RowNumSplitterNameNumberOfRowsNumberOfElementsMinElementLengthMaxElementLengthDurationMinLengthAvgLengthMaxLength

    1DelimitedSplit8K100011100.013001510

    2DelimitedSplit8K_NULLS100011100.293001510

    3DelimitedSplit8K_NULLS100011100.300001510

    4DelimitedSplit8K100021100.0160031121

    5DelimitedSplit8K_NULLS100021100.3600031121

    6DelimitedSplit8K_NULLS100021100.2400031121

    7DelimitedSplit8K100041100.0330092540

    8DelimitedSplit8K_NULLS100041100.4030092540

    9DelimitedSplit8K_NULLS100041100.3100092540

    10DelimitedSplit8K100081100.05600295074

    11DelimitedSplit8K_NULLS100081100.51300295074

    12DelimitedSplit8K_NULLS100081100.58600295074

    13DelimitedSplit8K1000161100.1100072102137

    14DelimitedSplit8K_NULLS1000161100.8430072102137

    15DelimitedSplit8K_NULLS1000161100.6960072102137

    16DelimitedSplit8K1000321100.24300148207251

    17DelimitedSplit8K_NULLS1000321101.37600148207251

    18DelimitedSplit8K_NULLS1000321101.23600148207251

    19DelimitedSplit8K1000641100.40300328415482

    20DelimitedSplit8K_NULLS1000641102.48600328415482

    21DelimitedSplit8K_NULLS1000641102.29000328415482

    22DelimitedSplit8K10001281100.82600734832942

    23DelimitedSplit8K_NULLS10001281104.69600734832942

    24DelimitedSplit8K_NULLS10001281104.46000734832942

    25DelimitedSplit8K10002561101.62000152816611815

    26DelimitedSplit8K_NULLS10002561108.88000152816611815

    27DelimitedSplit8K_NULLS10002561108.87300152816611815

    28DelimitedSplit8K10005121103.21000312433223493

    29DelimitedSplit8K_NULLS100051211018.02600312433223493

    30DelimitedSplit8K_NULLS100051211017.05000312433223493

    31DelimitedSplit8K100011501107.44600715874787798

    32DelimitedSplit8K_NULLS1000115011040.46000715874787798

    33DelimitedSplit8K_NULLS1000115011037.25000715874787798

    34DelimitedSplit8K1000110200.01600101420

    35DelimitedSplit8K_NULLS1000110200.20600101420

    36DelimitedSplit8K_NULLS1000110200.20600101420

    37DelimitedSplit8K1000210200.02000213041

    38DelimitedSplit8K_NULLS1000210200.30000213041

    39DelimitedSplit8K_NULLS1000210200.30600213041

    40DelimitedSplit8K1000410200.04600466281

    41DelimitedSplit8K_NULLS1000410200.46000466281

    42DelimitedSplit8K_NULLS1000410200.46000466281

    43DelimitedSplit8K1000810200.09600102126155

    44DelimitedSplit8K_NULLS1000810200.74000102126155

    45DelimitedSplit8K_NULLS1000810200.75000102126155

    46DelimitedSplit8K10001610200.18300216254292

    47DelimitedSplit8K_NULLS10001610201.32000216254292

    48DelimitedSplit8K_NULLS10001610201.32300216254292

    49DelimitedSplit8K10003210200.36000458510586

    50DelimitedSplit8K_NULLS10003210202.52300458510586

    51DelimitedSplit8K_NULLS10003210202.49600458510586

    52DelimitedSplit8K10006410200.7400094510221097

    53DelimitedSplit8K_NULLS10006410204.7560094510221097

    54DelimitedSplit8K_NULLS10006410204.7900094510221097

    55DelimitedSplit8K100012810201.44000194420472156

    56DelimitedSplit8K_NULLS100012810209.38300194420472156

    57DelimitedSplit8K_NULLS100012810209.44600194420472156

    58DelimitedSplit8K100025610202.88300390340964254

    59DelimitedSplit8K_NULLS1000256102018.67000390340964254

    60DelimitedSplit8K_NULLS1000256102018.86600390340964254

    61DelimitedSplit8K100048010205.45600743276807881

    62DelimitedSplit8K_NULLS1000480102034.67600743276807881

    63DelimitedSplit8K_NULLS1000480102035.24600743276807881

    64DelimitedSplit8K1000120300.01300202430

    65DelimitedSplit8K_NULLS1000120300.24000202430

    66DelimitedSplit8K_NULLS1000120300.24300202430

    67DelimitedSplit8K1000220300.02300415161

    68DelimitedSplit8K_NULLS1000220300.41300415161

    69DelimitedSplit8K_NULLS1000220300.40300415161

    70DelimitedSplit8K1000420300.0700086103121

    71DelimitedSplit8K_NULLS1000420300.6600086103121

    72DelimitedSplit8K_NULLS1000420300.6600086103121

    73DelimitedSplit8K1000820300.13600180206232

    74DelimitedSplit8K_NULLS1000820301.15600180206232

    75DelimitedSplit8K_NULLS1000820301.19300180206232

    76DelimitedSplit8K10001620300.27000379415456

    77DelimitedSplit8K_NULLS10001620302.17300379415456

    78DelimitedSplit8K_NULLS10001620302.21600379415456

    79DelimitedSplit8K10003220300.52600774830892

    80DelimitedSplit8K_NULLS10003220304.18600774830892

    81DelimitedSplit8K_NULLS10003220304.33000774830892

    82DelimitedSplit8K10006420301.05600158216631747

    83DelimitedSplit8K_NULLS10006420308.30000158216631747

    84DelimitedSplit8K_NULLS10006420308.37300158216631747

    85DelimitedSplit8K100012820302.19300320833253442

    86DelimitedSplit8K_NULLS1000128203016.45300320833253442

    87DelimitedSplit8K_NULLS1000128203016.68600320833253442

    88DelimitedSplit8K100025620304.22000648466576834

    89DelimitedSplit8K_NULLS1000256203032.53000648466576834

    90DelimitedSplit8K_NULLS1000256203032.85000648466576834

    91DelimitedSplit8K100029020304.76000736975397712

    92DelimitedSplit8K_NULLS1000290203037.29000736975397712

    93DelimitedSplit8K_NULLS1000290203038.92000736975397712

    94DelimitedSplit8K1000130400.01300303440

    95DelimitedSplit8K_NULLS1000130400.27300303440

    96DelimitedSplit8K_NULLS1000130400.28000303440

    97DelimitedSplit8K1000230400.02600617081

    98DelimitedSplit8K_NULLS1000230400.50300617081

    99DelimitedSplit8K_NULLS1000230400.51000617081

    100DelimitedSplit8K1000430400.09300124142158

    101DelimitedSplit8K_NULLS1000430400.89300124142158

    102DelimitedSplit8K_NULLS1000430400.93600124142158

    103DelimitedSplit8K1000830400.18600262286314

    104DelimitedSplit8K_NULLS1000830401.66300262286314

    105DelimitedSplit8K_NULLS1000830401.67600262286314

    106DelimitedSplit8K10001630400.35000529574613

    107DelimitedSplit8K_NULLS10001630403.19300529574613

    108DelimitedSplit8K_NULLS10001630403.25000529574613

    109DelimitedSplit8K10003230400.71300109711501222

    110DelimitedSplit8K_NULLS10003230407.12600109711501222

    111DelimitedSplit8K_NULLS10003230406.80300109711501222

    112DelimitedSplit8K10006430401.51600222523022376

    113DelimitedSplit8K_NULLS100064304012.73300222523022376

    114DelimitedSplit8K_NULLS100064304012.53000222523022376

    115DelimitedSplit8K100012830402.85600448746044706

    116DelimitedSplit8K_NULLS1000128304024.68000448746044706

    117DelimitedSplit8K_NULLS1000128304024.94300448746044706

    118DelimitedSplit8K100021030404.99600737875587705

    119DelimitedSplit8K_NULLS1000210304041.28300737875587705

    120DelimitedSplit8K_NULLS1000210304041.99300737875587705

    121DelimitedSplit8K1000140500.01300404450

    122DelimitedSplit8K_NULLS1000140500.33300404450

    123DelimitedSplit8K_NULLS1000140500.34000404450

    124DelimitedSplit8K1000240500.030008191101

    125DelimitedSplit8K_NULLS1000240500.656008191101

    126DelimitedSplit8K_NULLS1000240500.680008191101

    127DelimitedSplit8K1000440500.11600166183199

    128DelimitedSplit8K_NULLS1000440501.19600166183199

    129DelimitedSplit8K_NULLS1000440501.22600166183199

    130DelimitedSplit8K1000840500.22300341367392

    131DelimitedSplit8K_NULLS1000840502.44000341367392

    132DelimitedSplit8K_NULLS1000840502.31300341367392

    133DelimitedSplit8K10001640500.47600700735777

    134DelimitedSplit8K_NULLS10001640504.64300700735777

    135DelimitedSplit8K_NULLS10001640504.69600700735777

    136DelimitedSplit8K10003240500.91000140114711531

    137DelimitedSplit8K_NULLS10003240509.08000140114711531

    138DelimitedSplit8K_NULLS10003240509.06600140114711531

    139DelimitedSplit8K10006440501.77600286029453026

    140DelimitedSplit8K_NULLS100064405017.82300286029453026

    141DelimitedSplit8K_NULLS100064405017.84600286029453026

    142DelimitedSplit8K100012840503.54300578558875997

    143DelimitedSplit8K_NULLS1000128405035.12000578558875997

    144DelimitedSplit8K_NULLS1000128405036.19600578558875997

    145DelimitedSplit8K100016540504.55600747375897743

    146DelimitedSplit8K_NULLS1000165405045.69600747375897743

    147DelimitedSplit8K_NULLS1000165405045.45300747375897743

    Here's the revised function I tested. I won't claim that it is the best variation--just a quick-and-dirty trial run for comparison purposes.

    /*

    --Some test strings for manual testing

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(' ',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(NULL,' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(' ',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS(NULL,',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',1)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',1)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123 467 789',' ',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123, 467, 789',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123, 467, 789',',',1)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123,467,789',',',0)

    SELECT * FROM dbo.DelimitedSplit8K_NULLS('123,467,789',',',1)

    */

    CREATE FUNCTION [dbo].[DelimitedSplit8K_NULLS]

    (

    @pString VARCHAR(8000)

    ,@pDelimiter CHAR(1)

    ,@bRemoveSpaceAfterDelimiter BIT

    )

    RETURNS @Split TABLE

    (

    [ItemNumber] INT

    ,[Item] VARCHAR(50)

    )

    WITH SCHEMABINDING

    AS

    BEGIN

    SET @pString = NULLIF(RTRIM(LTRIM(@pString)),'')

    IF @bRemoveSpaceAfterDelimiter = 1

    SET @pString = REPLACE(@pString,@pDelimiter+' ',@pDelimiter)

    IF @pString IS NULL

    RETURN

    ELSE

    WITH E1(N)

    AS (

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    UNION ALL

    SELECT

    1

    ), --10E+1 or 10 rows

    E2(N)

    AS (

    SELECT

    1

    FROM

    E1 a

    ,E1 b

    ), --10E+2 or 100 rows

    E4(N)

    AS (

    SELECT

    1

    FROM

    E2 a

    ,E2 b

    ), --10E+4 or 10,000 rows max

    cteTally(N)

    AS (

    SELECT TOP (ISNULL(DATALENGTH(@pString),0))

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM E4

    ),

    cteStart(N1)

    AS (

    SELECT

    1

    UNION ALL

    SELECT

    t.N + 1

    FROM

    cteTally t

    WHERE

    SUBSTRING(@pString,t.N,1) = @pDelimiter

    ),

    cteLen(N1,L1)

    AS (

    SELECT

    s.N1

    ,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0) - s.N1,8000)

    FROM

    cteStart s

    )

    INSERT INTO @Split

    SELECT

    ItemNumber = ROW_NUMBER() OVER (ORDER BY l.N1)

    ,Item = SUBSTRING(@pString,l.N1,l.L1)

    FROM

    cteLen l

    WHERE

    NULLIF(RTRIM(LTRIM(SUBSTRING(@pString,l.N1,l.L1))),'') IS NOT NULL

    RETURN

    END

  • Due to the additional checking for nulls/blanks and removing blank rows from the returned table I figured efficiency would be seriously compromised. So I tested this revision against the original and it seriously under performs. In the results below the first run of DelimitedSplit8k_NULLS sets the parameter "@bRemoveSpacesAfterDelimiter" to 0 and the 2nd run set it to 1.

    Steve,

    I came up with the same results after doing almost exactly the same thing.

    In fact, I found that even after taking out the cleaning of the duplicate characters and check for null, the function with the return table declaration is a lot slower than the original single statement function. I wonder if SQL Server is using a different memory management scheme.

    I guess having the WHERE clause in the CROSS APPLY do the work gives the next person looking at my code an idea of what I am trying to do.

    Either way it has been a fun afternoon of testing the assumptions and changes. This is a perfect example of why I still love programming after over 20 years of doing this stuff. Going through Jeff's code line by line was quite a revelation in how to use CTE, NULLIF and "Tally" tables in ways I never thought of.

    Anton

  • Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune. You basically get a cursor under the covers because the mTVF cannot be truly in-lined with the query/queryplan. The optimization process is shot to hell too (although that won't come into play here). Voids parallelism also IIRC.

    Try doing a statement-level completion profiler trace while running each type of code. CAUTION: beware statement completed profiling on production systems!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (1/16/2013)


    Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune. You basically get a cursor under the covers

    What was startling for me was not that functions acted like an implicit cursor (I'd always known that) but rather that you could create inline functions that don't!

  • aochss (1/16/2013)


    The performance of this is amazing, as is the thought behind the code. These kind of posts really make you think and learn a whole new set of features in SQL

    Thanks Jeff.

    A couple of questions (sorry if it has been addressed - I didn't see it mentioned in the thread):

    I would not expect a record back if the string being sent in is NULL, a single space or all spaces. In my case, I have records where the string being split is often blank and I would want no records coming back from the CROSS APPLY. I could filter them out using in the WHERE clause.

    The best indeed would be to filter them out in the WHERE clause with something like WHERE LEN(StringColumn) > 0. Hence, the function calls would be for only those rows which are neither NULL nor containing only space/s.

    But this can also be done in the function itself as I have tried to do it.


    Also, some of my "not so good" data also had multiple spaces in between the values causing numerous records returned that were blank.

    For example:

    dbo.DelimitedSplit8k('123 467 789', ' ') -- There are two spaces between 467 and 789

    ItemNumber Item

    1 123

    2 467

    3 <-- Empty String

    4 789

    Thanks Again,

    Anton

    Here is my attempt while keeping the function iTVF (I did not test it thoroughly). Only slight changes are done

    CREATE FUNCTION [dbo].[DelimitedSplit8K_With_NULL_ZeroLengthString_And_ConsecutiveDelimiter_Filter]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter VARCHAR(1))

    --WARNING!!! DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    /*========= USE LEN() FUNCTION TO CHECK WHETHER STRING IS NULL OR CONTAIN ONLY SPACES =======*/

    WHERE LEN(@pString) > 0 --FILTER OUT NON_REQUIRED

    /*===========================================================================================*/

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1

    /*========= USE LEN() FUNCTION TO CHECK WHETHER STRING IS NULL OR CONTAIN ONLY SPACES =======*/

    WHERE LEN(@pString) > 0 --FILTER OUT NON_REQUIRED

    /*===========================================================================================*/

    UNION ALL

    SELECT t.N+1 FROM cteTally t

    WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter

    /*====== --FILTER OUT TWO CONSECUTIVE OCCURENCES OF DELIMITERS ============*/

    AND SUBSTRING(@pString,t.N+1,1) <> @pDelimiter

    /*=========================================================================*/

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = RTRIM(LTRIM(SUBSTRING(@pString, l.N1, l.L1)))--USE LTRIM AND RTRIM FUNCTION TO OMIT TRAILING SPACES

    FROM cteLen l

    ;

  • TheSQLGuru (1/16/2013)


    Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune.

    Not always true 🙂 It really depends upon several things. For instance, if we have to deal with VARCHAR(max) strings and multiple rows, the mTVF version of the tally table splitter can beat the iTVF version quite comprehensively. Sometimes, mTVF versions can show their worth too 😉

    I am not trying to be infamous but sometimes RBAR approach could be handy as well :-D. I hope my Anti-RBAR membership is not cancelled for issuing this statement :hehe:

  • Usman Butt (1/17/2013)


    TheSQLGuru (1/16/2013)


    Once you switch from an iTVF to a multi-statement TVF you are dead meat and the Fat Lady is belting a tune.

    Not always true 🙂 It really depends upon several things. For instance, if we have to deal with VARCHAR(max) strings and multiple rows, the mTVF version of the tally table splitter can beat the iTVF version quite comprehensively. Sometimes, mTVF versions can show their worth too 😉

    I am not trying to be infamous but sometimes RBAR approach could be handy as well :-D. I hope my Anti-RBAR membership is not cancelled for issuing this statement :hehe:

    Multi-line TVF work well when they are called/invoked sparsly or only once as say the logical starting table in your from clause, before other joins take place. They can also work quite well as part of an additional condition in a join. If it is not the only condition and applied using OR logic, it only needs to be invoked when other simpler conditions do not match.

    mTVF also allow some control over the indexing of the result table (trough PK and unique constraints). This can speed up certain operations in complex processing as the required indexing on the datamodel might not be there and/or too costly to implement for all records.

    What I generaly do is write first as an iTVF and if for some reason it turns out unexpectedly slow or impractical, make a mTVF from it as well with proper indexing to speed up the propblem operation.

  • I have no problem whatsoever in using the right tool for the job!! In fact, I make a fair bit of my consulting dollars cleaning up messes from clients who do NOT do that! 😎 Sometimes being "dead meat" IS the better alternative. :w00t:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Hello,

    Greatt article. I used the function and works great. I have a situation where I need to search for more than 1 character, then split the data. How could I use this function to search for 2 characters. For example, I want to search for semicolon, or, comma (;,), then, split the column if either is found.

  • How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

  • ahpitre (2/11/2013)


    How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

    Easiest way I can think of would be to simply use replace.

    select *

    from YourTable

    cross apply dbo.DelimitedSplit8k(replace(YourColumn, ';', ','), ',')

    This way you are still splitting on commas but it will work for either character. 😉

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • ahpitre (2/11/2013)


    How can I use this function to search for more than 1 character? I have data that is delimited by either a comma or semicolon (,;). Need to split contents based on either character.

    Just throw in a replace to change the second character into the desired delimiter. I can't guess what effect this would have on performance of the function.

    If you are talking about a two-dimensional array like 'A,B;X,Y;D,E' and you want to split that into two columns, then that's a different problem.

    DECLARE @strExample VARCHAR(8000)

    SET @strExample = 'A,B;C;D,E,F,G,H'

    SELECT

    ItemNumber

    ,Item

    FROM

    dbo.DelimitedSplit8K(REPLACE(@strExample,';',','),',') AS dsk

  • Sorry Sean...I didn't see that you had already answered with exactly the same idea. 😛

     

  • Steven Willis (2/11/2013)


    Sorry Sean...I didn't see that you had already answered with exactly the same idea. 😛

     

    No worries. Great minds...

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • If you have two delimiters you can change the code slightly:

    DECLARE @pString varchar(8000) = '55555;4444,333,22,1';

    DECLARE @pDelimiter1 char(1) = ',';

    DECLARE @pDelimiter2 char(2) = ';';

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) in (@pDelimiter1, @pDelimiter2)

    ),

    cteLen(N1,L1) AS(--==== Return start and length (for use in substring)

    SELECT s.N1,

    Case when ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)

    < ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)

    then ISNULL(NULLIF(CHARINDEX(@pDelimiter1,@pString,s.N1),0)-s.N1,8000)

    else ISNULL(NULLIF(CHARINDEX(@pDelimiter2,@pString,s.N1),0)-s.N1,8000)

    end

    FROM cteStart s

    )

    select * from cteLen;

    Performance of the second charindex might be similar to using replace on the input string. An alternative would be do join cteStart to itself to derive cteLen. For a single delimiter it is slower than charindex but for multiples it might be better. Certainly for more than 2 delimiters the self join would be simpler code to read.

Viewing 15 posts - 481 through 495 (of 990 total)

You must be logged in to reply to this topic. Login to reply