SUBSTRING and CHARINDEX query.... please help.

  • Hi,
    I have this table, column, and string:
    Table = audit_log 
    Column named = request_u
    Example column value = http://stable-test.test.net/api/files/fs.testcollection/TP-0000000.pdf/download/

    What I need to do is return the text between the 5th and 6th /. In the above example the returned text will be 'fs.testcollection'.
    The column values and length may change but it will always be the text between the 5th and 6th / (forward slash) I need to return.
    How do I achieve this?

    Thanks in advance.

  • You need to use a splitter function to split the string into its individual elements.

    John

  • You could use the Delimited8KSplit. Then it becomes as trivial as:

    SELECT DS.Item
    FROM YourTable YT
         CROSS APPLY dbo.Delimited8KSplit (YT.YourColumn, '/') DS
    WHERE DS.ItemNumber = 6;

    This would return the value "fs.testcollection", using your example url.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • I created a function called SubstringBetween8K which was designed for exactly this type of problem. It uses NGrams8K. Here's the function:


    CREATE FUNCTION [dbo].[substringBetween8K]
    (
    @string  varchar(8000),
    @start  tinyint,
    @stop  tinyint,
    @delimiter char(1)
    )
    /*****************************************************************************************
    Purpose:
    Takes in input string (@string) and returns the text between two instances of a delimiter
    (@delimiter); the location of the delimiters is defined by @start and @stop.

    For example: if @string = 'xx.yy.zz.abc', @start=1, @stop=3, and @delimiter = '.' the
    function will return the text: yy.zz; this is the text between the first and third
    instance of "." in the string "xx.yy.zz.abc".

    Compatibility:
    SQL Server 2008+

    Syntax:
    --===== Autonomous use
    SELECT sb.token, sb.position, sb.tokenLength
    FROM dbo.substringBetween8K(@string, @start, @stop, @delimiter); sb;

    --===== Use against a table
    SELECT sb.token, sb.position, sb.tokenLength
    FROM SomeTable st
    CROSS APPLY dbo.substringBetween8K(st.SomeColumn1, 1, 2, '.') sb;

    Parameters:
    @string  = varchar(8000); Input string to parse
    @delimiter = char(1); this is the delimiter use to determine where the output starts/ends
    @start  = tinyint; the first instance of @delimiter to search for; this is where the
         output should start. When @start is 0 then the function will return
         everything from the beginning of @string until @end.
    @stop  = tinyint; the last instance of @delimiter to search for; this is where the
         output should end. When @end is 0 then the function will return everything
        from @start until the end of the string.

    Return Types:
    Inline Table Valued Function returns:
    token   = varchar(8000); the substring between the two instances of @delimiter defined by
         @start and @stop
    position  = smallint; the location of where the substring begins
    tokenlength = length of the return token

    ---------------------------------------------------------------------------------------
    Developer Notes:
    1. Requires NGrams8K. The code for NGrams8K can be found here:
      http://www.sqlservercentral.com/articles/Tally+Table/142316/

    2. This function is what is referred to as an "inline" scalar UDF." Technically it's an
      inline table valued function (iTVF) but performs the same task as a scalar valued user
      defined function (UDF); the difference is that it requires the APPLY table operator
      to accept column values as a parameter. For more about "inline" scalar UDFs see this
      article by SQL MVP Jeff Moden: http://www.sqlservercentral.com/articles/T-SQL/91724/
      and for more about how to use APPLY see the this article by SQL MVP Paul White:
      http://www.sqlservercentral.com/articles/APPLY/69953/.
     
      Note the above syntax example and usage examples below to better understand how to
      use the function. Although the function is slightly more complicated to use than a
      scalar UDF it will yield notably better performance for many reasons. For example,
      unlike a scalar UDFs or multi-line table valued functions, the inline scalar UDF does
      not restrict the query optimizer's ability generate a parallel query execution plan.

    3. dbo.substringBetween8K is deterministic; for more about deterministic and
      nondeterministic functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx

    Examples:
    DECLARE @string varchar(8000) = '123.ABC456.333.222.3333XXX.$$$'

    -- beginning of string to 2nd delimiter, 2nd delimiter to end of the string
    SELECT '0, 2', * FROM dbo.substringBetween8K(@string,0,2, '.') UNION ALL
    SELECT '2, 0', * FROM dbo.substringBetween8K(@string,2,0, '.') UNION ALL

    -- Between the 1st & 2nd, then 2nd & 5th delimiters
    SELECT '1, 2', * FROM dbo.substringBetween8K(@string,1,2, '.') UNION ALL
    SELECT '2, 5', * FROM dbo.substringBetween8K(@string,2,5, '.') UNION ALL

    -- dealing with NULLS, delimiters that don't exist and when @first = @Last
    SELECT '2, 10', * FROM dbo.substringBetween8K(@string,2,10,'.') UNION ALL
    SELECT '1, NULL',* FROM dbo.substringBetween8K(@string,1,NULL,'.') UNION ALL
    SELECT '1, 1', * FROM dbo.substringBetween8K(@string,1,NULL,'.');

    ---------------------------------------------------------------------------------------
    Revision History:
    Rev 00 - 20160720 - Initial Creation - Alan Burstein
    ****************************************************************************************/

    RETURNS TABLE WITH SCHEMABINDING AS RETURN
    WITH
    chars AS
    (
    SELECT instance = 0, position = 0 WHERE @start = 0
    UNION ALL
    SELECT ROW_NUMBER() OVER (ORDER BY position), position
    FROM dbo.NGrams8k(@string,1)
    WHERE token = @delimiter
    UNION ALL
    SELECT -1, DATALENGTH(@string)+1 WHERE @stop = 0
    )
    SELECT token =
       SUBSTRING
       (
        @string,
        MIN(position)+1,
        NULLIF(MAX(position),MIN(position)) - MIN(position)-1
       ),
       position = CAST(
       CASE WHEN NULLIF(MAX(position),MIN(position)) - MIN(position)-1 > 0
       THEN MIN(position)+1 END AS smallint),
       tokenLength = CAST(NULLIF(MAX(position),MIN(position)) - MIN(position)-1 AS smallint)
    FROM chars
    WHERE instance IN (@start, NULLIF(@stop,0), -1);

    And here's how you'd use it:


    DECLARE @x varchar(100) = 'http://stable-test.test.net/api/files/fs.testcollection/TP-0000000.pdf/download/'

    SELECT * FROM dbo.substringBetween8K(@x, 5, 6, '/');

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks for your replies.
    I'm at a basic level using SQL so forgive me for my novice questions.
    The database I'll be pulling from is not mine. I only have read-access. With this said, can I still use your function Alan?  or the Delimited8kSplit function?
    As I'm seeking to 'SELECT' the data only and do all my functionality in the SELECT statement, I guess I cannot achieve what I wish to?  Hence my aim of using SUBSTRING and CHARINDEX functions.
    Thanks.

  • Decomposing the splitter function might not give the best performance, but it could help in your case.

    --Create sample data
    CREATE TABLE #Sample ( SomeColumn varchar(1000))
    INSERT INTO #Sample
    VALUES('Part 1/Part 2/Part 3/Part 4/Part 5/Part 6/Part 7/Part 8/Part 9');

    --The solution starts here
    WITH E1(N) AS (
          SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
          SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
          SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
          ),         --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b) --10E+4 or 10,000 rows max
    SELECT *
    FROM #Sample
    CROSS APPLY(
        SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),
           Item = SUBSTRING(SomeColumn,s.N1,ISNULL(NULLIF((LEAD(s.N1,1,1) OVER (ORDER BY s.N1) - 1),0)-s.N1,8000))
        FROM (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
           SELECT t.N+1
           FROM (
              SELECT 0 UNION ALL
              SELECT TOP (DATALENGTH(ISNULL(SomeColumn,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
            ) t(N)
           WHERE (SUBSTRING(SomeColumn,t.N,1) = '/' OR t.N = 0)
           ) s(N1)
        )s
    WHERE s.ItemNumber = 6;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • If you want to do it with Substring and CHARINDEX here you go.
    <SQL>
    DECLARE @testVARCHAR(500);
    SET @test-2 = 'http://stable-test.test.net/api/files/fs.testcollection/TP-0000000.pdf/download/';

    SELECT SUBSTRING(@test,
    CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2) + 1) + 1) +1) + 1) + 1,
    (CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2) + 1) + 1) +1) + 1) +1)
    - CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2, CHARINDEX('/', @test-2) + 1) + 1) +1) + 1)) - 1)

    </SQL>

    -------------------------------------------------------------
    we travel not to escape life but for life not to escape us
    Don't fear failure, fear regret.

  • chocthree - Thursday, June 22, 2017 10:23 AM

    Thanks for your replies.
    I'm at a basic level using SQL so forgive me for my novice questions.
    The database I'll be pulling from is not mine. I only have read-access. With this said, can I still use your function Alan?  or the Delimited8kSplit function?
    As I'm seeking to 'SELECT' the data only and do all my functionality in the SELECT statement, I guess I cannot achieve what I wish to?  Hence my aim of using SUBSTRING and CHARINDEX functions.
    Thanks.

    If you can't create new SQL objects such as functions then you would have to extract the logic and use it inline as Luis demonstrated. Luis beat me to it - his answer is right on the money. Note that the logic he used comes from Eirikur Eiriksson's DelimitedSplit8K_LEAD a modified  (and slightly faster) version of DelimitedSplit8K that uses the LEAD function.  I put together how you would accomplish the same task extracting the logic from DelimitedSplit8K; I would recommend Luis' solution as it will perform a little better here's what I put together in case anyone was curious how to do this on Pre-2012 systems. 

    Using the sample data Luis created:
    SELECT *
    FROM #Sample t
    CROSS APPLY
    (
    SELECT
      SUBSTRING(t.SomeColumn, position+1, CHARINDEX('/',t.SomeColumn,position+1) - (position+1))
    FROM
    (
      SELECT ROW_NUMBER() OVER (ORDER BY iTally.N), N
      FROM
      (
      SELECT 1 UNION ALL
      SELECT TOP (ISNULL(DATALENGTH(t.SomeColumn),0))
       ROW_NUMBER() OVER (ORDER BY (SELECT NULL))+1
      FROM
      (
       SELECT 1
       FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x),
         (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) b(x),
         (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) c(x),
         (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) d(x)
      ) E4(N)
      ) iTally(N)
      WHERE SUBSTRING(t.SomeColumn,iTally.N,1) = '/'
    ) positions(rn, position)
    WHERE rn = 6
    ) splitter(item);

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • If you are on SQL Server 2016 then use the new native STRING_SPLIT function.
    😎

  • Eirikur Eiriksson - Friday, June 23, 2017 7:43 AM

    If you are on SQL Server 2016 then use the new native STRING_SPLIT function.
    😎

    Eirikur, I've heard tell that performance with that isn't as good as with the 8K split, although I haven't actually tested it myself.  But also, STRING_SPLIT doesn't return a sequence number, so you'd have to rely on the elements always being returned in the correct order (is that guaranteed?) and do you own numbering of the result set.

    John

  • Eirikur Eiriksson - Friday, June 23, 2017 7:43 AM

    If you are on SQL Server 2016 then use the new native STRING_SPLIT function.
    😎

    The problem with that, is that there's no item number assigned and no order guaranteed for the output of that function.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Thank you so much for all your help with this.
    I actually got there with the below. I just need to replace _ with / and the 192_... with the column name. It works.
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    select
    a.first, a.second , a.third, a.fourth, a.fifth, a.sixth,a.seventh,
    substring ('192_168_0_145_15_0_333_999' , a.fourth+1 , a.fifth-(a.fourth+1)) as actval

    From
    (
    select
      charindex( '_', '192_168_0_145_15_0_333_999' ) as first,
    charindex( '_', '192_168_0_145_15_0_333_999', charindex( '_', '192_168_0_145_15_0_333_999' )+) as second  ,
    charindex( '_', '192_168_0_145_15_0_333_999',charindex( '_', '192_168_0_145_15_0_333_999' ,  charindex( '_', '192_168_0_145_15_0_333_999' ) +1) +) third ,
    charindex( '_', '192_168_0_145_15_0_333_999',charindex( '_', '192_168_0_145_15_0_333_999' ,  charindex( '_', '192_168_0_145_15_0_333_999', charindex( '_', '192_168_0_145_15_0_333_999' )+1 ) +1) +)  fourth,
    charindex( '_', '192_168_0_145_15_0_333_999',charindex( '_', '192_168_0_145_15_0_333_999' ,  charindex( '_', '192_168_0_145_15_0_333_999', charindex( '_', '192_168_0_145_15_0_333_999' ,charindex( '_', '192_168_0_145_15_0_333_999')+1 )+1 ) +1) +)  fifth,
     charindex( '_', '192_168_0_145_15_0_333_999',charindex( '_', '192_168_0_145_15_0_333_999' ,  charindex( '_', '192_168_0_145_15_0_333_999', charindex( '_', '192_168_0_145_15_0_333_999' ,charindex( '_', '192_168_0_145_15_0_333_999' ,charindex( '_', '192_168_0_145_15_0_333_999' )+1 )+1 )+1 ) +1) +)  sixth ,
    charindex( '_', '192_168_0_145_15_0_333_999',charindex( '_', '192_168_0_145_15_0_333_999' ,  charindex( '_', '192_168_0_145_15_0_333_999', charindex( '_', '192_168_0_145_15_0_333_999' ,charindex( '_', '192_168_0_145_15_0_333_999' ,charindex( '_', '192_168_0_145_15_0_333_999' ,charindex( '_', '192_168_0_145_15_0_333_999')+1 )+1 )+1 )+1 ) +1) +) seventh
    ) a
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------

  • Eirikur Eiriksson - Friday, June 23, 2017 7:43 AM

    If you are on SQL Server 2016 then use the new native STRING_SPLIT function.
    😎

    I wouldn't.  It has no ordinal for the position of the returned elements.  Really bad oversight on the part of MS since they won't openly guarantee the correct order is returned.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The tally table and/or spring split functions are good ways to go, but recursive CTEs are interesting to learn and I have used them in many one-offs where I can't program in the database. Here is one I have had bookmarked from stackoverflow to help me learn that I modified for here.

    /*
        Use a recursive CTE and the CHARINDEX function to break a URL into substrings at each "/".
        A modified script found on stackoverflow: https://stackoverflow.com/questions/8726111/sql-server-find-nth-occurrence-in-a-string
    */

    GO
    if OBJECT_ID('tempdb..#t_Strings') is not null
    BEGIN
    Drop Table #t_Strings;
    END;
    GO
    if OBJECT_ID('tempdb..#t_Interim') is not null
    BEGIN
    Drop Table #t_Interim;
    END;

    create table #t_Strings(
        URL_String varchar(255) NULL
        );

    insert into #t_Strings (URL_String)
    select 'http://stable-test.test.net/api/files/fs.testcollection/TP-0000000.pdf/download/' union all
    select 'http://stable-test.test.net/api_state/file/fs.anothercollection/TP-1111111.exe/upload/' union all
    select 'https://stackoverflow.com/questions/8726111/sql-server-find-nth-occurrence-in-a-string';

    with cte_Strings(URL_String, Nth_Start, Nth_Found) as (
      select URL_String, 1, charindex('/', URL_String) from #t_Strings
      union all
      select URL_String, Nth_Found + 1, charindex('/', URL_String, Nth_Found + 1)
      from cte_Strings
      where Nth_Found > 0
    )
    select URL_String, N = ROW_NUMBER() over (partition by URL_String order by Nth_Start), Nth_Start, Nth_Found,
        substring(URL_String, Nth_Start, case when Nth_Found > 0 then Nth_Found - Nth_Start else len(URL_String) end) as Nth_Substring
    into #t_Interim
    from cte_Strings
    order by URL_String, Nth_Start;

    -- All substrings numbered
    select *
    from #t_Interim
    order by URL_String, N;

    -- Desired Result
    select *
    from #t_Interim
    where N=6
    order by URL_String;

  • I have done some fairly extensive tests on the string splitting options and the JSON split methods and the STRING_SPLIT functions are by far the best performers if one is on 2016 or later, by far!
    Encapsulating the output in a CTE with a ROW_NUMBER has proven to be accurate and consistent when it comes to the item's ordinals, hence my recommendation.
    😎

    Hopefully I will find the time to do a write-up soon

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply