April 3, 2015 at 6:51 am
I have writtena da query to search for a string in an expression by the number of it's appearance. Script is like this:
DECLARE @Expression VARCHAR(8000) = 'abcd_e_fgh',
@SearchString VARCHAR(10)= '_',
@OccuranceNumber SMALLINT = 1
DECLARE @SearchIndex INT = 0, @SearchIndexPrevious INT = 0, @Sno INT = 0
WHILE @Sno < @OccuranceNumber BEGIN
SELECT @SearchIndex = CHARINDEX(@SearchString, @Expression, @SearchIndex + 1)
IF @SearchIndexPrevious >= @SearchIndex BEGIN
SELECT @SearchIndex = -1
END ELSE BEGIN
SELECT @SearchIndexPrevious = @SearchIndex
END
SELECT @SearchIndexPrevious = @SearchIndex
SELECT @Sno = @Sno + 1
END
SELECT @SearchIndex , @SearchIndexPrevious
Here i'm trying to search "_" in expression "abcd_e_fgh" where it is appearing for first time. it gives me 5 correctly. Now when i change the @OccurenceNumber to 2 or 3, it gives correct values 7 and -1 respectively. However now when i change it to 4, it gives me 5. So when it's trying to check for fifth appearance of "_", it's not actually giving 0 or -1 but repeating the value 5. Could anyone see anything wrong in the script or is it sql's strange behaviour ?
April 3, 2015 at 7:17 am
Slightly different approach
😎
USE tempdb;
GO
SET NOCOUNT ON;
GO
DECLARE @Expression VARCHAR(8000) = 'abcd_e_fgh';
DECLARE @SearchString VARCHAR(10) = '_';
DECLARE @OccuranceNumber INT = 1;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS (SELECT TOP(LEN(@Expression)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4)
,PARSED_SET AS
(
SELECT
ROW_NUMBER() OVER
(
ORDER BY (SELECT NULL)
) AS OCC_NO
,NM.N
FROM NUMS NM
WHERE SUBSTRING(@Expression, NM.N, LEN(@SearchString)) = @SearchString
)
SELECT
PS.OCC_NO
,PS.N AS POS
FROM PARSED_SET PS
WHERE PS.OCC_NO = @OccuranceNumber;
Results
OCC_NO POS
------- ----
1 5
April 3, 2015 at 7:19 am
Hi Eirikur, Thanks but this was a question with just an example. Actually it is a generic one. String could be anything and not just "abcd_e_fgh".
Though i've just sorted out the issue.
April 3, 2015 at 7:20 am
sqlnaive, I have just fired your script through the sql debugger, It resets the index to -1 after reaching the end of the string, so your algorithm starts again looking for the target. As an example, try it with 7
April 3, 2015 at 7:21 am
DECLARE @Expression VARCHAR(8000) = 'abcd_e_fgh',
@SearchString VARCHAR(10)= '_',
@OccuranceNumber SMALLINT = 5
DECLARE @SearchIndex INT = 0, @SearchIndexPrevious INT = 0, @Sno INT = 0
WHILE @Sno < @OccuranceNumber BEGIN
SELECT @SearchIndex = CHARINDEX(@SearchString, @Expression, @SearchIndex + 1)
IF @SearchIndexPrevious >= @SearchIndex BEGIN
--SELECT @SearchIndex = -1
BREAK
END ELSE BEGIN
SELECT @SearchIndexPrevious = @SearchIndex
END
SELECT @SearchIndexPrevious = @SearchIndex
SELECT @Sno = @Sno + 1
END
SELECT @SearchIndex , @SearchIndexPrevious
April 3, 2015 at 7:22 am
Yeah DouglasH, you are correct. I sorted out that point and bingo. Was not that complex but seems I need a coffee break. 🙂 Was stuck at this for some time.
April 3, 2015 at 7:26 am
Looks like you solved it yourself by putting the question out there. It's a pretty good strategy when you're stuck. 🙂 Well done.
April 3, 2015 at 7:28 am
sqlnaive (4/3/2015)
Hi Eirikur, Thanks but this was a question with just an example. Actually it is a generic one. String could be anything and not just "abcd_e_fgh".Though i've just sorted out the issue.
Just turn the code into a inlineable table value function
😎
CREATE FUNCTION dbo.ITVFN_EE_LOCATE_PATTERN_IN_STRING
(
@Expression VARCHAR(8000)
,@SearchString VARCHAR(10)
,@OccuranceNumber INT
)
RETURNS TABLE
WITH SCHEMABINDING
RETURN
WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS (SELECT TOP(LEN(@Expression)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4)
,PARSED_SET AS
(
SELECT
ROW_NUMBER() OVER
(
ORDER BY (SELECT NULL)
) AS OCC_NO
,NM.N
FROM NUMS NM
WHERE SUBSTRING(@Expression, NM.N, LEN(@SearchString)) = @SearchString
)
SELECT
PS.OCC_NO
,PS.N AS POS
FROM PARSED_SET PS
WHERE PS.OCC_NO = @OccuranceNumber;
GO
April 3, 2015 at 9:32 am
It's worth noting SQLNAIVE, that Eirikur's solution will perform about 3X faster.
I took your solution and put it into a scalar valued function like so:
CREATE FUNCTION dbo.SVF_LOCATE_PATTERN_IN_STRING
(
@Expression VARCHAR(8000),
@SearchString VARCHAR(10),
@OccuranceNumber SMALLINT
)
RETURNS INT
AS
BEGIN
DECLARE @SearchIndex INT = 0, @SearchIndexPrevious INT = 0, @Sno INT = 0
WHILE @Sno < @OccuranceNumber BEGIN
SELECT @SearchIndex = CHARINDEX(@SearchString, @Expression, @SearchIndex + 1)
IF @SearchIndexPrevious >= @SearchIndex BEGIN
--SELECT @SearchIndex = -1
BREAK
END ELSE BEGIN
SELECT @SearchIndexPrevious = @SearchIndex
END
SELECT @SearchIndexPrevious = @SearchIndex
SELECT @Sno = @Sno + 1
END
RETURN @SearchIndexPrevious
END
GO
Eirikur's excellent solution is a Table Valued function so the syntax is slightly different but the results are the same if you run both like so:
DECLARE
@Expression VARCHAR(8000) = 'xxx123dddrrreee333123',
@SearchString VARCHAR(10) = '2',
@OccuranceNumber INT = 2;
SELECT *
FROM dbo.ITVFN_EE_LOCATE_PATTERN_IN_STRING(@Expression, @SearchString, 2) ;
SELECT @OccuranceNumber AS OCC_NO, dbo.SVF_LOCATE_PATTERN_IN_STRING(@Expression, @SearchString, 2) AS POS;
GO
Results:
OCC_NO POS
----------- -----------
2 20
OCC_NO POS
----------- -----------
2 20
To demonstrate the performance advantage that Eirikur's solution provides I created a 1,000,000 row test:
IF OBJECT_ID('tempdb..#testdata') IS NOT NULL DROP TABLE #testdata;
GO
CREATE TABLE #testdata (td varchar(36) NOT NULL);
INSERT INTO #testdata SELECT TOP (1000000) newid() FROM sys.all_columns a, sys.all_columns b;
GO
SET NOCOUNT ON;
DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS
DECLARE @x int;
SET STATISTICS TIME ON;
PRINT 'loop version:'
SELECT @x = dbo.SVF_LOCATE_PATTERN_IN_STRING(td, 'a', 2)
FROM #testdata
PRINT 'EE version:'
SELECT @x = pos
FROM #testdata
CROSS APPLY dbo.ITVFN_EE_LOCATE_PATTERN_IN_STRING(td, 'a', 2);
SET STATISTICS TIME OFF;
GO
Results:
loop version:
SQL Server Execution Times:
CPU time = 7859 ms, elapsed time = 8761 ms.
EE version:
SQL Server Execution Times:
CPU time = 10109 ms, elapsed time = 2796 ms.
The CPU time is higher because Erikikur's function creates a parallel plan but, as you can see, it resolves the query about 3 times faster.
-- Itzik Ben-Gan 2001
April 3, 2015 at 10:02 am
Thanks Alan for this, I was about to do the same when I saw your post, good job!
😎
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply