Get Surrounding text in a string?

  • I need a function or MSSQL query that returns the surrounding text in a string. So, let's say I have a row that contains the following comma separated string:

    "adventure,devotion,for children,happiness,imagination,love,world,lovely,confusion,introspection,on work and working,on writing and words,"

    The user searches for the word "lovely".

    I want to return the word "lovely" and the text surrounding it like: "imagination,love,world,lovely,confusion,introspection,on work and working,"

    What's the most efficient way to do this. Currently I am doing this which includes a function for returning a specific number of words...

    dbo.FirstNWords(REPLACE(LEFT([Categories],CHARINDEX(','+ @term,[Categories],20)),',',', '),3)+' '+ dbo.FirstNWords(REPLACE(SUBSTRING([Categories],CHARINDEX(@term,[Categories]),LEN([Categories])),',',', '),3)

    It works, however it seems too slow.

  • Sorry, this should be in the SQL Server 2012 - T-SQL forum...

  • Quick solution which uses dbo.DelimitedSplit8K[/url]

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @SAMPLE_STR VARCHAR(500) = 'adventure,devotion,for children,happiness,imagination,love,world,lovely,confusion,introspection,on work and working,on writing and words,';

    DECLARE @SEARCH_WORD VARCHAR(50) = 'lovely';

    DECLARE @WORD_COUNT INT = 3;

    ;WITH WORD_LIST AS

    (

    SELECT

    WL.ItemNumber

    ,WL.Item

    ,MAX(CASE

    WHEN WL.Item = @SEARCH_WORD THEN WL.ItemNumber

    ELSE 0

    END) OVER

    (

    PARTITION BY (SELECT NULL)

    ) AS WL_FLAG

    FROM dbo.DelimitedSplit8K(@SAMPLE_STR,',') AS WL

    )

    SELECT

    STUFF( (SELECT

    ',' + W.Item

    FROM WORD_LIST W

    WHERE W.ItemNumber BETWEEN W.WL_FLAG - @WORD_COUNT AND W.WL_FLAG + @WORD_COUNT

    FOR XML PATH(''),TYPE).value('.[1]','VARCHAR(500)')

    ,1,1,'') AS OUTPUT_STR;

    Results

    OUTPUT_STR

    --------------------------------------------------------------------------

    imagination,love,world,lovely,confusion,introspection,on work and working

  • Second solution, which is more efficient than the previous one as it doesn't use the XML concatenation. The code is somewhat self-explanatory.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @SAMPLE_STR VARCHAR(8000) = 'adventure,devotion,for children,happiness,imagination,love,world,lovely,confusion,introspection,on work and working,on writing and words';

    DECLARE @SEARCH_WORD VARCHAR(50) = 'lovely';--

    DECLARE @WORD_COUNT INT = 3;

    DECLARE @DELIMITER CHAR(1) = ',';

    /* Inline Tally Table

    Range 1-8000

    */

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ) AS X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@SAMPLE_STR)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)

    /* Base information

    W_POS = Position of the search word within the string

    W_LEN = Length of the string

    */

    ,WORD_POS AS

    (

    SELECT

    CHARINDEX(@SEARCH_WORD,@SAMPLE_STR,1) AS W_POS

    ,LEN(@SAMPLE_STR) AS W_LEN

    )

    /* Identify the position of all delimiters in the string.

    LAG and LEAD to bring the desired range of

    (@WORD_COUNT x 2) + 1

    */

    ,WORD_GROUPS AS

    (

    SELECT

    NM.N

    ,WP.W_POS

    ,WP.W_LEN

    ,LAG(NM.N,@WORD_COUNT,0) OVER

    (

    ORDER BY NM.N

    ) AS START_POS

    ,LEAD(NM.N,@WORD_COUNT + 1,0) OVER

    (

    ORDER BY NM.N

    ) AS END_POS

    FROM NUMS NM

    CROSS APPLY WORD_POS WP

    WHERE SUBSTRING(@SAMPLE_STR,NM.N,1) = @DELIMITER

    )

    /* Find the group where the search word is closest to the

    delimiter and calculate the length of the extraction

    */

    ,RANKED_GROUPS AS

    (

    SELECT

    DENSE_RANK() OVER

    (

    ORDER BY ABS(WG.W_POS - WG.N)

    ) RG_DRNK

    ,WG.START_POS

    ,CASE

    WHEN WG.END_POS = 0 THEN WG.W_LEN - WG.START_POS

    ELSE (WG.END_POS - WG.START_POS) - 1

    END AS XLEN

    ,WG.W_POS

    FROM WORD_GROUPS WG

    )

    /* Return NULL if the search word is not within the string

    else the right group of words.

    */

    SELECT

    CASE

    WHEN RG.W_POS = 0 THEN NULL

    ELSE SUBSTRING(@SAMPLE_STR,RG.START_POS + 1,RG.XLEN)

    END AS OUTPUT_STR

    FROM RANKED_GROUPS RG

    WHERE RG.RG_DRNK = 1;

    Results

    OUTPUT_STR

    --------------------------------------------------------------------------

    imagination,love,world,lovely,confusion,introspection,on work and working

  • Eirikur Eiriksson (4/19/2015)


    Second solution, which is more efficient than the previous one as it doesn't use the XML concatenation.

    /* Find the group where the search word is closest to the

    delimiter and calculate the length of the extraction

    */

    I don't know if that's true or not (and I will find out) but what a freakin' awesome idea, Eirikur! And, not only does it work in 2K5 but I've got a couple of other ideas for this. Well done!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks. Is it possible to put this into a function? Or how would I use that within a select statement?

  • markpringle (4/19/2015)


    Thanks. Is it possible to put this into a function? Or how would I use that within a select statement?

    Sure. I'd make sure that it was an "Inline Table Valued Function" for performance but there's no reason why this couldn't be encapsulated for reuse in such a manner.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It uses a Temp table, but temp tables can't be used in functions, right? I am just not SQL savvy enough to know how to repurpose it.

  • He! Now I must retract the earlier statement, the bespoke function got left behind like an Escargot on a barbecue (I like them flamed)!

    Here is the code for the inline(able) table value functions, the first one works on 2005 and later and the second one on 2012 and later.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    GO

    /* dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K

    Delimitedsplit8K based solution

    */

    IF OBJECT_ID(N'dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K') IS NOT NULL DROP FUNCTION dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K;

    GO

    CREATE FUNCTION dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K

    (

    @SAMPLE_STR VARCHAR(500)

    ,@SEARCH_WORD VARCHAR(50)

    ,@WORD_COUNT INT

    ,@DELIMITER CHAR(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    WITH WORD_LIST AS

    (

    SELECT

    WL.ItemNumber

    ,WL.Item

    ,MAX(CASE

    WHEN WL.Item = @SEARCH_WORD THEN WL.ItemNumber

    ELSE 0

    END) OVER

    (

    PARTITION BY (SELECT NULL)

    ) AS WL_FLAG

    FROM dbo.DelimitedSplit8K(@SAMPLE_STR,@DELIMITER) AS WL

    )

    SELECT

    CASE WHEN CHARINDEX(@SEARCH_WORD,@SAMPLE_STR,1) > 0 THEN

    STUFF( (SELECT

    @DELIMITER + W.Item

    FROM WORD_LIST W

    WHERE W.ItemNumber BETWEEN W.WL_FLAG - @WORD_COUNT AND W.WL_FLAG + @WORD_COUNT

    FOR XML PATH(''),TYPE).value('.[1]','VARCHAR(500)')

    ,1,1,'')

    ELSE NULL

    END

    AS OUTPUT_STR;

    GO

    /* dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE

    Alternative solution based on an inline tally

    table and LAG/LEAD

    */

    IF OBJECT_ID(N'dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE') IS NOT NULL DROP FUNCTION dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE;

    GO

    CREATE FUNCTION dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE

    (

    @SAMPLE_STR VARCHAR(8000)

    ,@SEARCH_WORD VARCHAR(50)

    ,@WORD_COUNT INT

    ,@DELIMITER CHAR(1)

    )

    RETURNS TABLE

    WITH SCHEMABINDING

    AS

    RETURN

    /* Inline Tally Table

    Range 1-8000

    */

    WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ) AS X(N))

    , NUMS(N) AS (SELECT TOP(LEN(@SAMPLE_STR)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3)

    /* Base information

    W_POS = Position of the search word within the string

    W_LEN = Length of the string

    */

    ,WORD_POS AS

    (

    SELECT

    CHARINDEX(@SEARCH_WORD,@SAMPLE_STR,1) AS W_POS

    ,LEN(@SAMPLE_STR) AS W_LEN

    )

    /* Identify the position of all delimiters in the string.

    LAG and LEAD to bring the desired range of

    (@WORD_COUNT x 2) + 1

    */

    ,WORD_GROUPS AS

    (

    SELECT

    NM.N

    ,WP.W_POS

    ,WP.W_LEN

    ,LAG(NM.N,@WORD_COUNT,0) OVER

    (

    ORDER BY NM.N

    ) AS START_POS

    ,LEAD(NM.N,@WORD_COUNT + 1,0) OVER

    (

    ORDER BY NM.N

    ) AS END_POS

    FROM NUMS NM

    CROSS APPLY WORD_POS WP

    WHERE SUBSTRING(@SAMPLE_STR,NM.N,1) = @DELIMITER

    )

    /* Find the group where the search word is closest to the

    delimiter and calculate the length of the extraction

    */

    ,RANKED_GROUPS AS

    (

    SELECT

    DENSE_RANK() OVER

    (

    ORDER BY ABS(WG.W_POS - WG.N)

    ) RG_DRNK

    ,WG.START_POS

    ,CASE

    WHEN WG.END_POS = 0 THEN WG.W_LEN - WG.START_POS

    ELSE (WG.END_POS - WG.START_POS) - 1

    END AS XLEN

    ,WG.W_POS

    FROM WORD_GROUPS WG

    )

    /* Return NULL if the search word is not within the string

    else the right group of words.

    */

    SELECT

    CASE

    WHEN RG.W_POS = 0 THEN NULL

    ELSE SUBSTRING(@SAMPLE_STR,RG.START_POS + 1,RG.XLEN)

    END AS OUTPUT_STR

    FROM RANKED_GROUPS RG

    WHERE RG.RG_DRNK = 1;

    GO

    Then some sample data to play around with

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_SURROUND_TEST') IS NOT NULL DROP TABLE dbo.TBL_SURROUND_TEST;

    DECLARE @SAMPLE_SIZE INT = 3; -- 4 = 531441 ROWS = 12 ^ 3

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

    ,SAMPLE_DATA AS

    (

    SELECT

    NM.N AS TRAN_ID

    ,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) + CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) + CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 )) AS TRIPLET

    FROM NUMS NM

    )

    SELECT

    ISNULL(ROW_NUMBER() OVER

    (

    ORDER BY (SELECT NULL)

    ),0) AS SD_RID

    ,CONCAT(

    S01.TRIPLET ,CHAR(44)

    ,S02.TRIPLET ,CHAR(44)

    ,S03.TRIPLET ,CHAR(44)

    ,S04.TRIPLET ,CHAR(44)

    ,S05.TRIPLET ,CHAR(44)

    ,S06.TRIPLET ,CHAR(44)

    ,S07.TRIPLET ,CHAR(44)

    ,S08.TRIPLET ,CHAR(44)

    ,S09.TRIPLET ,CHAR(44)

    ,S10.TRIPLET ,CHAR(44)

    ,S11.TRIPLET ,CHAR(44)

    ,S12.TRIPLET) AS INPUT_STR

    INTO dbo.TBL_SURROUND_TEST

    FROM SAMPLE_DATA S01

    CROSS JOIN SAMPLE_DATA S02

    CROSS JOIN SAMPLE_DATA S03

    CROSS JOIN SAMPLE_DATA S04

    CROSS JOIN SAMPLE_DATA S05

    CROSS JOIN SAMPLE_DATA S06

    CROSS JOIN SAMPLE_DATA S07

    CROSS JOIN SAMPLE_DATA S08

    CROSS JOIN SAMPLE_DATA S09

    CROSS JOIN SAMPLE_DATA S10

    CROSS JOIN SAMPLE_DATA S11

    CROSS JOIN SAMPLE_DATA S12

    ;

    ALTER TABLE dbo.TBL_SURROUND_TEST ADD CONSTRAINT PK_DBO_SURROUND_TEST_SD_RID PRIMARY KEY CLUSTERED (SD_RID ASC);

    SELECT

    COUNT(*) AS SAMPLE_COUNT

    FROM dbo.TBL_SURROUND_TEST;

    Test harness

    USE tempdb;

    GO

    SET NOCOUNT ON;

    DECLARE @INT_BUCKET INT = 0;

    DECLARE @CHAR_BUCKET_01 VARCHAR(100) = '';

    DECLARE @CHAR_BUCKET_02 VARCHAR(100) = '';

    RAISERROR(N'-------------------------------------------------------------

    DRY RUN

    -------------------------------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS IO,TIME ON;

    SELECT

    @INT_BUCKET = ST.SD_RID

    ,@CHAR_BUCKET_02 = ST.INPUT_STR

    FROM dbo.TBL_SURROUND_TEST ST;

    SET STATISTICS TIME,IO OFF;

    RAISERROR(N'-------------------------------------------------------------

    iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K

    -------------------------------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS IO,TIME ON;

    SELECT

    @INT_BUCKET = ST.SD_RID

    ,@CHAR_BUCKET_01 = X.OUTPUT_STR

    ,@CHAR_BUCKET_02 = ST.INPUT_STR

    FROM dbo.TBL_SURROUND_TEST ST

    CROSS APPLY dbo.iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K(ST.INPUT_STR,'ABC',3,',') AS X

    WHERE X.OUTPUT_STR IS NOT NULL;

    SET STATISTICS TIME,IO OFF;

    RAISERROR(N'-------------------------------------------------------------

    iTVF_SURROUNDING_WORDS_ALTERNATIVE

    -------------------------------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS IO,TIME ON;

    SELECT

    @INT_BUCKET = ST.SD_RID

    ,@CHAR_BUCKET_01 = X.OUTPUT_STR

    ,@CHAR_BUCKET_02 = ST.INPUT_STR

    FROM dbo.TBL_SURROUND_TEST ST

    CROSS APPLY dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE(ST.INPUT_STR,'ABC',3,',') AS X

    WHERE X.OUTPUT_STR IS NOT NULL;

    SET STATISTICS TIME,IO OFF;

    Results:blush:

    -------------------------------------------------------------

    DRY RUN

    -------------------------------------------------------------

    Table 'TBL_SURROUND_TEST'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 125 ms, elapsed time = 131 ms.

    -------------------------------------------------------------

    iTVF_SURROUNDING_WORDS_DELIMITEDSPLIT8K

    -------------------------------------------------------------

    Table 'Worktable'. Scan count 396, logical reads 12605, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'TBL_SURROUND_TEST'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 5086 ms, elapsed time = 5088 ms.

    -------------------------------------------------------------

    iTVF_SURROUNDING_WORDS_ALTERNATIVE

    -------------------------------------------------------------

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'TBL_SURROUND_TEST'. Scan count 1, logical reads 4483, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 66612 ms, elapsed time = 66738 ms.

    Post Mortem: Looks like the dbo.iTVF_SURROUNDING_WORDS_ALTERNATIVE got hit by an SORT Operator, passed away very slowly.

    Edit: Just remembered that on this test system the DelimitedSplit8K is an alias for DelimitedSplit8K_Lead, might produce slightly different results.

  • markpringle (4/19/2015)


    It uses a Temp table, but temp tables can't be used in functions, right? I am just not SQL savvy enough to know how to repurpose it.

    I see nothing of a temp table in the code. It's just a standard practice for folks like Eirikur and many others to add "USE TempDB" to code to ensure that we cause no damage to a real database whether we use any DDL or not. It's just a habit on forums.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply