How to Retrieve a Substring From a Larger String of Varying Length

  • Hello -

    I have a unique problem - I have a requirement to retrieve a substring which appears between the fourth and fifth underscores in a string.  The parent string varies in length, so I actually have to do a bunch of charindexes on the underscores to get what I need.  I'm having a hell of a time trying to do this.

    Following is an example of the string -

    [0107602_000002008823_20150206_1_449589981255609050_20150206__20.26]

    I need to retreive the numbers between the fourth and fifth underscores.

    Here is another example string -

    [0111904_000001054979_20131113_1_69413497_20131113__30.24]

    The strings can vary in length which is making this more difficult.

    Can anyone help?

  • Use DelimitedSplit8K() and then specify the item# you want to return?

    Jeff Moden wrote an article about splitting strings and included a UDF to do it. It returns (Value, Position) so you could do something like (from memory... I'm reinstalling SQL Server right now...)

    SELECT ca.Item, ca.Position

    FROM MyTable t

    CROSS APPLY DelimitedSplit8K(t.FieldToSplit,',') ca

    WHERE ca.Position = 4

    So I'm sure my code needs a bit of tweaking, but hopefully you get the idea.

  • (Why can't I edit my original response? Hmm...)

    Here's the article:

    https://www.sqlservercentral.com/articles/tally-oh-an-improved-sql-8k-“csv-splitter”-function

    SELECT test.SomeID, test.SomeValue, split.ItemNumber, Item = QUOTENAME(split.Item,'"')

    FROM #JBMTest test

    CROSS APPLY dbo.DelimitedSplit8k(test.SomeValue,',') split

    WHERE split.ItemNumber = 4;

  • Thanks for the help, i'll take a look.

  • if your using SQL Svr 2016 or greater there is a built in spit string function

     

    STRING_SPLIT ( string , separator )

    declare @str varchar(500)
    set @str ='0107602_000002008823_20150206_1_449589981255609050_20150206__20.26'


    select row_number() over (order by (select null)) as rn,* from STRING_SPLIT ( @str , '_' )

     

     

    ***The first step is always the hardest *******

  • Unfortunately - I don't think STRING_SPLIT will work

    From the BOL for String_split:

     

    The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string. You can override the final sort order by using an ORDER BY clause on the SELECT statement (ORDER BY value).

    Given that you can't count on part # 4 actually being row #4.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4) wrote:

    Unfortunately - I don't think STRING_SPLIT will work From the BOL for String_split:

    The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string. You can override the final sort order by using an ORDER BY clause on the SELECT statement (ORDER BY value).

    Given that you can't count on part # 4 actually being row #4.

    Heh... by the time MS gets around to fixing their function, I'll be well into my retirement.  Long live DelimitedSplit8K! 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Presumably MS is following relational theory, where the order of a set doesn't matter.

    But, in the real world, the order often does matter a lot, especially when the set has already been denormalized into a single list.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Totally agreed that in real life, order does matter.  I'll also state that it also matters in "relational theory".  If it didn't, indexes wouldn't do a bloody thing for us except to provide more narrow objects to make scans faster.

    And, yeah... I also realize that relational theory doesn't exactly cover things like performance.  Well, unless you consider the laws of thermodynamics and how fast the fans on the CPUs will need to spin. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Polymorphist wrote:

    Hello - I have a unique problem - I have a requirement to retrieve a substring which appears between the fourth and fifth underscores in a string.  The parent string varies in length, so I actually have to do a bunch of charindexes on the underscores to get what I need.  I'm having a hell of a time trying to do this. Following is an example of the string - [0107602_000002008823_20150206_1_449589981255609050_20150206__20.26] I need to retreive the numbers between the fourth and fifth underscores. Here is another example string - [0111904_000001054979_20131113_1_69413497_20131113__30.24] The strings can vary in length which is making this more difficult. Can anyone help?

    If your real data is as consistent as your sample values, there's no reason to use any splitter.

    The fact is it looks like a reasonable amount of care went into keeping these string consistent over all. In fact, it looks like the only part that isn't consistent is the part you're after...

    (1, '[0107602_000002008823_20150206_1_449589981255609050_20150206__20.26]'),
    (2, '[0111904_000001054979_20131113_1_69413497_20131113__30.24]');
    -- [07-nums_0000-padto12_yyyymmdd_N_********_yyyymmdd__nn.nn] -- easy to use pattern...
    -- >----------34 chars---------------< ??? >---17 chars------<

    So... If we know that there will always be 34 characters preceding your data and 17 following, all we need to know is the total length. Subtract the two known lengths from the total length and that leaves the length of your substring.

    IF OBJECT_ID('tempdb..#TestData', 'U') IS NOT NULL 
    BEGIN DROP TABLE #TestData; END;

    CREATE TABLE #TestData (
    id tinyint NOT NULL,
    original_string varchar(100) NOT NULL
    );
    INSERT #TestData (id, original_string) VALUES
    (1, '[0107602_000002008823_20150206_1_449589981255609050_20150206__20.26]'),
    (2, '[0111904_000001054979_20131113_1_69413497_20131113__30.24]');
    -- [07-nums_0000-padto12_yyyymmdd_N_********_yyyymmdd__nn.nn] -- easy to use pattern...
    -- >----------34 chars---------------< ??? >---17 chars------<

    SELECT
    td.id,
    td.original_string,
    parsed_section = SUBSTRING(td.original_string, 34, LEN(td.original_string) - 50)-- (50 = 34+17-1)
    FROM
    #TestData td;

    Results...

    id   original_string                                                        parsed_section
    ---- ---------------------------------------------------------------------- -----------------------
    1 [0107602_000002008823_20150206_1_449589981255609050_20150206__20.26] 449589981255609050
    2 [0111904_000001054979_20131113_1_69413497_20131113__30.24] 69413497
  • Here's another alternative to splitting, to show how CROSS/OUTER APPLY can assign alias names to computed values that can be used in subsequent APPLYs and in the SELECT itself, without having to rely on a fixed byte location.  As you yourself noted, trying to do it with one long string of embedded CHARINDEXes is hellishly difficult.  The APPLY technique is straightforward, so it's useful to have for when you need it again.

    SELECT string, SUBSTRING(string, delim4 + 1, delim5 - delim4 - 1) AS substring
    FROM ( VALUES('0107602_000002008823_20150206_1_449589981255609050_20150206__20.26'),
    ('doesnt_have_enough_underscores_doh!' )) AS data(string)
    OUTER APPLY (
    SELECT CHARINDEX('_', string) AS delim1
    WHERE string LIKE '%[_]%[_]%[_]%[_]%[_]%'
    ) AS ca1
    OUTER APPLY (
    SELECT CHARINDEX('_', string, delim1 + 1) AS delim2
    WHERE string LIKE '%[_]%[_]%[_]%[_]%[_]%'
    ) AS ca2
    OUTER APPLY (
    SELECT CHARINDEX('_', string, delim2 + 1) AS delim3
    WHERE string LIKE '%[_]%[_]%[_]%[_]%[_]%'
    ) AS ca3
    OUTER APPLY (
    SELECT CHARINDEX('_', string, delim3 + 1) AS delim4
    WHERE string LIKE '%[_]%[_]%[_]%[_]%[_]%'
    ) AS ca4
    OUTER APPLY (
    SELECT CHARINDEX('_', string, delim4 + 1) AS delim5
    WHERE string LIKE '%[_]%[_]%[_]%[_]%[_]%'
    ) AS ca5

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Another way to do it (it's just a modified version of ScottPlecher's good method):

    ;WITH CTE AS 
    (
    SELECT *
    FROM (VALUES('0107602_000002008823_20150206_1_449589981255609050_20150206__20.26'),
    ('doesnt_have_enough_underscores_doh!' )) AS data(string)
    )
    SELECT data.string,
    SUBSTRING(data.string, x.StartChar, x.EndChar-x.StartChar) AS substring
    FROM CTE data
    OUTER APPLY (SELECT ca4.delim + 1 AS StartChar,
    ca5.delim AS EndChar
    FROM (VALUES (CHARINDEX('_', data.string))) ca1(delim)
    CROSS APPLY (VALUES (CHARINDEX('_', data.string, ca1.delim + 1))) ca2(delim)
    CROSS APPLY (VALUES (CHARINDEX('_', data.string, ca2.delim + 1))) ca3(delim)
    CROSS APPLY (VALUES (CHARINDEX('_', data.string, ca3.delim + 1))) ca4(delim)
    CROSS APPLY (VALUES (CHARINDEX('_', data.string, ca4.delim + 1))) ca5(delim)
    WHERE data.string LIKE '%[_]%[_]%[_]%[_]%[_]%'
    ) x
  • Jeff Moden wrote:

    Matt Miller (4) wrote:

    Unfortunately - I don't think STRING_SPLIT will work From the BOL for String_split:

    The output rows might be in any order. The order is not guaranteed to match the order of the substrings in the input string. You can override the final sort order by using an ORDER BY clause on the SELECT statement (ORDER BY value).

    Given that you can't count on part # 4 actually being row #4.

    Heh... by the time MS gets around to fixing their function, I'll be well into my retirement.  Long live DelimitedSplit8K! 😀

    Yes indeed. I really don't know what was going through their mind when they put a splitter in that can't guarantee I can put it back together again....It would have been straightforward to at least include in the return rows which "part" you were in the splitting process, e.g.

    Part       string

    1              The

    4             over

    5             the

    2             Cow

    3            jumped

    6            moon

     

    Order no longer needs to be guaranteed, but you can put the string back together when you need to.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Again, MS is following the relational model, where a set does not have a defined order.  That may be fine theoretically, but in the real world we often do need a specific order of things in a set.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Understood - but I'm not getting into a lost cause debate about having Microsoft change one of its core assumptions since the early 90's for SQL server:  simply pointing out that they could include the physical "part sequence" in the output which would at least provide us enough data for the new built-in function to be of any use whatsoever.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply