Find the Nth Occurrence of a Character in a String

  • cadebryant

    Ten Centuries

    Points: 1035

    Comments posted to this topic are about the item Find the Nth Occurrence of a Character in a String

  • bveenings-1104226

    SSC Journeyman

    Points: 79

    helped me alot, thnx:-)

  • Mark Cowne

    One Orange Chip

    Points: 26684

    I suspect this would be more efficient with a numbers/tally table

    declare @TargetStr varchar(8000),

    @SearchedStr varchar(8000),

    @Occurrence int

    set @TargetStr='a'

    set @SearchedStr='abbabba'

    set @Occurrence=3;

    WITH Occurrences AS (

    SELECT Number,

    ROW_NUMBER() OVER(ORDER BY Number) AS Occurrence

    FROM master.dbo.spt_values

    WHERE Number BETWEEN 1 AND LEN(@SearchedStr) AND type='P'

    AND SUBSTRING(@SearchedStr,Number,LEN(@TargetStr))=@TargetStr)

    SELECT Number

    FROM Occurrences

    WHERE Occurrence=@Occurrence

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • mwilliamson 38158

    Valued Member

    Points: 67

    Looks wrong, if you go more than 1 above the maximum char number then it will loop through again.

    if you change the WHILE statement from while (@counter < @Occurrence)

    to while (@counter < @Occurrence) and @pos <> 0that should fix the issue

    Try running below

    DECLARE @TXT varchar(255)

    SET @TXT = 'Michael Williamson'

    SELECT dbo.CHARINDEX2('l',@TXT,1)

    SELECT dbo.CHARINDEX2('l',@TXT,2)

    SELECT dbo.CHARINDEX2('l',@TXT,4)

    SELECT dbo.CHARINDEX2('l',@TXT,6)

    Hope that helps

  • sam.russell

    Valued Member

    Points: 65

    Thanks! I found this function really useful. Just adding the middle line here got around the looping issue for me:

    set @pos = @ret

    if @pos = 0 set @counter = @Occurrence

    end

  • ssddforever

    SSC Journeyman

    Points: 75

    Thank you for your fix code! I was banging my head agains the wall to find out why my stored proc was bugging out sometimes - turns out that one of my functions was using this charindex2 without accounting for looping behavior

  • Luis Cazares

    SSC Guru

    Points: 183496

    This is an old thread but I would suggest a different approach. It might get give better results performance wise.

    /*

    Example:

    SELECT dbo.CHARINDEX2('a', 'abbabba', 3)

    returns the location of the third occurrence of 'a'

    which is 7

    */

    CREATE FUNCTION [dbo].[ProperCase]

    (

    @TargetStr CHAR(1),

    @SearchedStr VARCHAR(8000),

    @Occurrence INT = 2

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT N FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))x(N)

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@SearchedStr),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteAppearances(N, rn) AS( SELECT N, ROW_NUMBER() OVER (ORDER BY N) rn

    FROM cteTally

    WHERE SUBSTRING(@SearchedStr, N, 1) = @TargetStr

    )

    SELECT TOP 1 CASE WHEN rn >= @Occurrence THEN N ELSE 0 END N

    FROM cteAppearances

    WHERE rn = @Occurrence OR rn < @Occurrence

    ORDER BY N DESC

    GO

    /*

    Example:

    SELECT dbo.CHARINDEX2('a', 'abbabba', 3)

    returns the location of the third occurrence of 'a'

    which is 7

    */

    CREATE FUNCTION [dbo].[ProperCase2]

    (

    @TargetStr VARCHAR(8000),

    @SearchedStr VARCHAR(8000),

    @Occurrence INT = 2

    )

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT N FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1))x(N)

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@SearchedStr),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteAppearances(N, rn) AS( SELECT N, ROW_NUMBER() OVER (ORDER BY N) rn

    FROM cteTally

    WHERE SUBSTRING(@SearchedStr, N, LEN(@TargetStr)) = @TargetStr

    )

    SELECT TOP 1 CASE WHEN rn >= @Occurrence THEN N ELSE 0 END N

    FROM cteAppearances

    WHERE rn = @Occurrence OR rn < @Occurrence

    ORDER BY N DESC

    GO

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • sandeepmittal11

    SSC Veteran

    Points: 228

  • stjepan.mateljan

    Newbie

    Points: 9

    Hello!

    I am using charindex2 at the moment.
    Since I have rows with more than 8000 characters, i've changed these  two from (8000) to (max):
    @TargetStr varchar(8000), 
    @SearchedStr varchar(8000)

    While both fixes:
    while (@counter < @Occurrence) and @pos <> 0
    and:
    if @pos = 0 set @counter = @Occurrence
    prevents looping of whole dataset, it doesn't prevent first instance to loop forever.

    Example:
    I have: 'a,b,c,d,e'
    I want to put that into table with say 8 rows, like this: 'a','b','c','d','e','','',''
    Without any of the above, it gives me this: 'a','b','c','d','e','a','b','c'
    But with either (or even both), it gives me this:  'a','b','c','d','e','a','a','a'

    I presume it's something elementary, but I'm not seeing it.

    I am using SQL Server 2014 Express.

    Thank you!

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply