Find a String in a String in SQL Server

  • quagmired

    Old Hand

    Points: 324

    Alan.B (3/20/2016)


    quagmired (3/20/2016)


    This is exactly what I'm talking about when I mention "cumbersome TSQL solution". For arbitrary 0s, I'd say a tally table would have to have over 2 billion records, but we can stick with 1 million.

    Nothing cumbersome about it. A tally table has many, many uses but I digress. If we're talking about anything other than varchar(max) all we need is a tally table with 8000 records. If we're talking about varchar(max) then we could get the job done using a tally table function such as the one below (written by Jeff Moden coincidentally):

    IF OBJECT_ID('dbo.fnTally') IS NOT NULL DROP FUNCTION dbo.fnTally

    GO

    CREATE FUNCTION dbo.fnTally

    /**********************************************************************************************************************

    Purpose:

    Return a column of BIGINTs from @ZeroOrOne up to and including @MaxN with a max value of 1 Billion.

    As a performance note, it takes about 00:02:10 (hh:mm:ss) to generate 1 Billion numbers to a throw-away variable.

    Usage:

    --===== Syntax example (Returns BIGINT)

    SELECT t.N

    FROM dbo.fnTally(@ZeroOrOne,@MaxN) t

    ;

    Notes:

    1. Based on Itzik Ben-Gan's cascading CTE (cCTE) method for creating a "readless" Tally Table source of BIGINTs.

    Refer to the following URLs for how it works and introduction for how it replaces certain loops.

    http://www.sqlservercentral.com/articles/T-SQL/62867/

    http://sqlmag.com/sql-server/virtual-auxiliary-table-numbers

    2. To start a sequence at 0, @ZeroOrOne must be 0 or NULL. Any other value that's convertable to the BIT data-type

    will cause the sequence to start at 1.

    3. If @ZeroOrOne = 1 and @MaxN = 0, no rows will be returned.

    5. If @MaxN is negative or NULL, a "TOP" error will be returned.

    6. @MaxN must be a positive number from >= the value of @ZeroOrOne up to and including 1 Billion. If a larger

    number is used, the function will silently truncate after 1 Billion. If you actually need a sequence with

    that many values, you should consider using a different tool. 😉

    7. There will be a substantial reduct

    ion in performance if "N" is sorted in descending order. If a descending

    sort is required, use code similar to the following. Performance will decrease by about 27% but it's still

    very fast especially compared with just doing a simple descending sort on "N", which is about 20 times slower.

    If @ZeroOrOne is a 0, in this case, remove the "+1" from the code.

    DECLARE @MaxN BIGINT;

    SELECT @MaxN = 1000;

    SELECT DescendingN = @MaxN-N+1

    FROM dbo.fnTally(1,@MaxN);

    8. There is no performance penalty for sorting "N" in ascending order because the output is explicity sorted by

    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    Revision History:

    Rev 00 - Unknown - Jeff Moden

    - Initial creation with error handling for @MaxN.

    Rev 01 - 09 Feb 2013 - Jeff Moden

    - Modified to start at 0 or 1.

    Rev 02 - 16 May 2013 - Jeff Moden

    - Removed error handling for @MaxN because of exceptional cases.

    **********************************************************************************************************************/

    (@ZeroOrOne BIT, @MaxN INT)

    RETURNS TABLE WITH SCHEMABINDING AS RETURN

    WITH

    E1(N) AS (SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1) --10E1 or 10 rows

    , E3(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c) --10E3 or 1 Thousand rows

    , E9(N) AS (SELECT 1 FROM E3 a, E3 b, E3 c) --10E9 or 1 Billion rows

    SELECT N = 0 WHERE ISNULL(@ZeroOrOne,0)= 0 --Conditionally start at 0.

    UNION ALL

    SELECT TOP(@MaxN) N = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E9; -- Values from 1 to @MaxN

    GO

    ... Now we don't need a table at all.

    I don't want to stray from the point I was trying to make so indulge me for moment and let's pretend we're dealing with a varchar(8000), nvarchar(4000) column or smaller. Now let's take my solution a little further using the tally table and sample data and apply that logic to an indexed view:

    IF OBJECT_ID('tempdb.dbo.vw_StringPat1') IS NOT NULL DROP VIEW dbo.vw_StringPat1;

    GO

    -- Use logic from solution #1 for an indexed view

    -- 1st the index

    CREATE VIEW dbo.vw_StringPat1 WITH SCHEMABINDING AS

    SELECT StrID, StrTxt

    FROM dbo.strings

    CROSS JOIN dbo.tally t

    WHERE t.N < LEN(s.StrTxt) AND StrTxt LIKE '%1'+REPLICATE('0',t.N)+'1%';

    GO

    CREATE UNIQUE CLUSTERED INDEX pk_vw_StringPat1 ON dbo.vw_StringPat1(StrID, StrTxt);

    Now when we run this query:

    SELECT StrID, StrTxt

    FROM dbo.strings s

    CROSS JOIN dbo.tally t

    WHERE t.N < LEN(s.StrTxt) AND StrTxt LIKE '%1'+REPLICATE('0',t.N)+'1%';

    Or this query:

    SELECT * FROM dbo.vw_StringPat1;

    We get this index plan:

    Again, for my example we're talking about varchar(8000), nvarchar(4000) or less. And this would not always be the best solution there. That said, for resolving your 1<1 or more 0's>1 problem this solution blows the pants off a regex-based solution in performance and number of reads. I'm not trying to trash regex or belittle your point. I'm just saying that you may be underestimating the power of the relational model, SQL Server and T-SQL.

    UPDATE: Here's another solution that works great for varchar(<anything>). Using the Pattern Based Splitter referenced in my signature (PatternSplitCM), we can do this:

    SELECT s.*

    FROM dbo.strings s

    CROSS APPLY dbo.PatternSplitCM(StrTxt,'%[01]%')

    WHERE item LIKE '%1%0%1%';

    You would have to modify the function however to support varchar(max) (which is not that hard to do).

    Edit: Added new solution, fixed a couple typos, added comment about modifying PatternSplitCM to varchar(max)

    Perhaps you will consider the rest of my response to your prior post later on?

    I am not attempting to discredit tally tables. I'm not sure what you mean by the "relational model" in this context but I whole-heartedly endorse a set-based approach anywhere it can be had. But what we are talking about has nothing to do with that.

    That said, for resolving your 1<1 or more 0's>>1 problem this solution blows the pants off a regex-based solution in performance and number of reads.

    Name that regex solution (and given the barrier of entry for true regex to SQL, PROVE IT). It certainly does not exist in SQL Server to date, and if it did, it would prevail for these sorts of problems.

    Seriously, for a string of length N, creating N strings of 0s of lengths 1 through N per value to compare against on top of %LIKE%? That's ridiculous to do against one string. Run that against a million values, one or two of which are VERY long. You just wasted multiple kilowatt-hours.

    And here we are talking about ONE of an infinitude of simple problems that must be handled outside of SQL Server.

  • Jeff Moden

    SSC Guru

    Points: 995152

    quagmired (3/20/2016)


    This is exactly what I'm talking about when I mention "cumbersome TSQL solution". For arbitrary 0s, I'd say a tally table would have to have over 2 billion records

    That would be what I'm talking about. 2 Billion zeros is insane whether you're using T-SQL, Regex, or a hammer. 😉 You've never done that before and likely never will nor will anyone reading this. Can we stick with some practical examples, please.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

Viewing 2 posts - 46 through 47 (of 47 total)

You must be logged in to reply to this topic. Login to reply