Charindex issue

  • I have writtena da query to search for a string in an expression by the number of it's appearance. Script is like this:

    DECLARE @Expression VARCHAR(8000) = 'abcd_e_fgh',

    @SearchString VARCHAR(10)= '_',

    @OccuranceNumber SMALLINT = 1

    DECLARE @SearchIndex INT = 0, @SearchIndexPrevious INT = 0, @Sno INT = 0

    WHILE @Sno < @OccuranceNumber BEGIN

    SELECT @SearchIndex = CHARINDEX(@SearchString, @Expression, @SearchIndex + 1)

    IF @SearchIndexPrevious >= @SearchIndex BEGIN

    SELECT @SearchIndex = -1

    END ELSE BEGIN

    SELECT @SearchIndexPrevious = @SearchIndex

    END

    SELECT @SearchIndexPrevious = @SearchIndex

    SELECT @Sno = @Sno + 1

    END

    SELECT @SearchIndex , @SearchIndexPrevious

    Here i'm trying to search "_" in expression "abcd_e_fgh" where it is appearing for first time. it gives me 5 correctly. Now when i change the @OccurenceNumber to 2 or 3, it gives correct values 7 and -1 respectively. However now when i change it to 4, it gives me 5. So when it's trying to check for fifth appearance of "_", it's not actually giving 0 or -1 but repeating the value 5. Could anyone see anything wrong in the script or is it sql's strange behaviour ?

  • Slightly different approach

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    GO

    DECLARE @Expression VARCHAR(8000) = 'abcd_e_fgh';

    DECLARE @SearchString VARCHAR(10) = '_';

    DECLARE @OccuranceNumber INT = 1;

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@Expression)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4)

    ,PARSED_SET AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    ORDER BY (SELECT NULL)

    ) AS OCC_NO

    ,NM.N

    FROM NUMS NM

    WHERE SUBSTRING(@Expression, NM.N, LEN(@SearchString)) = @SearchString

    )

    SELECT

    PS.OCC_NO

    ,PS.N AS POS

    FROM PARSED_SET PS

    WHERE PS.OCC_NO = @OccuranceNumber;

    Results

    OCC_NO POS

    ------- ----

    1 5

  • Hi Eirikur, Thanks but this was a question with just an example. Actually it is a generic one. String could be anything and not just "abcd_e_fgh".

    Though i've just sorted out the issue.

  • sqlnaive, I have just fired your script through the sql debugger, It resets the index to -1 after reaching the end of the string, so your algorithm starts again looking for the target. As an example, try it with 7

  • DECLARE @Expression VARCHAR(8000) = 'abcd_e_fgh',

    @SearchString VARCHAR(10)= '_',

    @OccuranceNumber SMALLINT = 5

    DECLARE @SearchIndex INT = 0, @SearchIndexPrevious INT = 0, @Sno INT = 0

    WHILE @Sno < @OccuranceNumber BEGIN

    SELECT @SearchIndex = CHARINDEX(@SearchString, @Expression, @SearchIndex + 1)

    IF @SearchIndexPrevious >= @SearchIndex BEGIN

    --SELECT @SearchIndex = -1

    BREAK

    END ELSE BEGIN

    SELECT @SearchIndexPrevious = @SearchIndex

    END

    SELECT @SearchIndexPrevious = @SearchIndex

    SELECT @Sno = @Sno + 1

    END

    SELECT @SearchIndex , @SearchIndexPrevious

  • Yeah DouglasH, you are correct. I sorted out that point and bingo. Was not that complex but seems I need a coffee break. 🙂 Was stuck at this for some time.

  • Looks like you solved it yourself by putting the question out there. It's a pretty good strategy when you're stuck. 🙂 Well done.

  • sqlnaive (4/3/2015)


    Hi Eirikur, Thanks but this was a question with just an example. Actually it is a generic one. String could be anything and not just "abcd_e_fgh".

    Though i've just sorted out the issue.

    Just turn the code into a inlineable table value function

    😎

    CREATE FUNCTION dbo.ITVFN_EE_LOCATE_PATTERN_IN_STRING

    (

    @Expression VARCHAR(8000)

    ,@SearchString VARCHAR(10)

    ,@OccuranceNumber INT

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    RETURN

    WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@Expression)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4)

    ,PARSED_SET AS

    (

    SELECT

    ROW_NUMBER() OVER

    (

    ORDER BY (SELECT NULL)

    ) AS OCC_NO

    ,NM.N

    FROM NUMS NM

    WHERE SUBSTRING(@Expression, NM.N, LEN(@SearchString)) = @SearchString

    )

    SELECT

    PS.OCC_NO

    ,PS.N AS POS

    FROM PARSED_SET PS

    WHERE PS.OCC_NO = @OccuranceNumber;

    GO

  • It's worth noting SQLNAIVE, that Eirikur's solution will perform about 3X faster.

    I took your solution and put it into a scalar valued function like so:

    CREATE FUNCTION dbo.SVF_LOCATE_PATTERN_IN_STRING

    (

    @Expression VARCHAR(8000),

    @SearchString VARCHAR(10),

    @OccuranceNumber SMALLINT

    )

    RETURNS INT

    AS

    BEGIN

    DECLARE @SearchIndex INT = 0, @SearchIndexPrevious INT = 0, @Sno INT = 0

    WHILE @Sno < @OccuranceNumber BEGIN

    SELECT @SearchIndex = CHARINDEX(@SearchString, @Expression, @SearchIndex + 1)

    IF @SearchIndexPrevious >= @SearchIndex BEGIN

    --SELECT @SearchIndex = -1

    BREAK

    END ELSE BEGIN

    SELECT @SearchIndexPrevious = @SearchIndex

    END

    SELECT @SearchIndexPrevious = @SearchIndex

    SELECT @Sno = @Sno + 1

    END

    RETURN @SearchIndexPrevious

    END

    GO

    Eirikur's excellent solution is a Table Valued function so the syntax is slightly different but the results are the same if you run both like so:

    DECLARE

    @Expression VARCHAR(8000) = 'xxx123dddrrreee333123',

    @SearchString VARCHAR(10) = '2',

    @OccuranceNumber INT = 2;

    SELECT *

    FROM dbo.ITVFN_EE_LOCATE_PATTERN_IN_STRING(@Expression, @SearchString, 2) ;

    SELECT @OccuranceNumber AS OCC_NO, dbo.SVF_LOCATE_PATTERN_IN_STRING(@Expression, @SearchString, 2) AS POS;

    GO

    Results:

    OCC_NO POS

    ----------- -----------

    2 20

    OCC_NO POS

    ----------- -----------

    2 20

    To demonstrate the performance advantage that Eirikur's solution provides I created a 1,000,000 row test:

    IF OBJECT_ID('tempdb..#testdata') IS NOT NULL DROP TABLE #testdata;

    GO

    CREATE TABLE #testdata (td varchar(36) NOT NULL);

    INSERT INTO #testdata SELECT TOP (1000000) newid() FROM sys.all_columns a, sys.all_columns b;

    GO

    SET NOCOUNT ON;

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    DECLARE @x int;

    SET STATISTICS TIME ON;

    PRINT 'loop version:'

    SELECT @x = dbo.SVF_LOCATE_PATTERN_IN_STRING(td, 'a', 2)

    FROM #testdata

    PRINT 'EE version:'

    SELECT @x = pos

    FROM #testdata

    CROSS APPLY dbo.ITVFN_EE_LOCATE_PATTERN_IN_STRING(td, 'a', 2);

    SET STATISTICS TIME OFF;

    GO

    Results:

    loop version:

    SQL Server Execution Times:

    CPU time = 7859 ms, elapsed time = 8761 ms.

    EE version:

    SQL Server Execution Times:

    CPU time = 10109 ms, elapsed time = 2796 ms.

    The CPU time is higher because Erikikur's function creates a parallel plan but, as you can see, it resolves the query about 3 times faster.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Thanks Alan for this, I was about to do the same when I saw your post, good job!

    😎

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply