Uncommon Table Expressions

  • Comments posted to this topic are about the item Uncommon Table Expressions



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

  • really liked the article...

    "Keep Trying"

  • I was surprised to see the implementation of CTE recursion (in 2005 at least) performed similarly too inbuilt functions for small string lengths (read napkin test of up to 400) for the last example compared to

    SELECT @str2 = SUBSTRING(@Str1,1,@offset-1) + @char + Substring(@Str1,@offset+1,len(@str1))

    The elapsed time was comparable

    The recursive reported measurable cpu time and 2 worktable scans

    The inbuilt did not report anything

    Forgive if I am new to this but does that mean that inbuilt functions don't add to reported CPU ?

    The inbuilt in a scalar select didn't generate a query plan on execution; I guess it wasn't considered worth caching.

  • Nice article 🙂

  • Nice Article. I would definitely like to go with this technique as there are very limited scenarios where you need huge documents to get parsed character by character. I have checked this routine for about 20,000 characters and it gave results in approx. 1 sec. Quite good.

    Nice and informative...

    Atif Sheikh

    ----------------------------------------------------------------------------------------------------------------------------------------------------------------------
    Sometimes, winning is not an issue but trying.
    You can check my BLOG
    [font="Arial Black"]here[/font][/url][/right]

  • Recursive CTE's are great for answering hierarchical questions, but usages such as this strike me as a bit lame. Despite your disclaimer, it looks as though less reasoning types will use this way beyond the narrow scope intended by the article.

    I'd much rather see the Database Vendor support regular expressions in a more comprehensive manner (e.g perl-compatible etc).

    select replace(PhoneNumber, '\D', '', 'g') as "Cleaned Phone Number"

    from .......

    ......

  • Good article!

  • Those who might use 'The REPLACE_Nth_CHAR Function' might also be interested in STUFF (deletes a specified length of characters and inserts another set of characters at a specified starting point).

  • The "replace Nth character" doesn't actually catch the Nth instance of a particular character, it just replaces the Nth character.

    Try it on "Susy sells sea shells by the sea shore.", and try using it to replace the third "s". As written, it doesn't even have an input parameter (variable) for which character to replace, nor does adding one to the case statement accomplish this.

    What it really does is simply replace the third character (or whichever character you use in the case statement). Since that can be done much more easily with Stuff, I don't see a purpose for that particular piece of code. If you really want to replace the third "s" in that string, here's how I'd go about it:

    declare @String varchar(1000), @String2 varchar(1000),

    @Letter char(1), @Occurance int, @Replace char(1)

    select @String = 'Susy sells sea shells by the sea shore.',

    @String2 = '',

    @Letter = 's',

    @Occurance = 3,

    @Replace = 'X'

    ;with Chars ([Char], [Pos], Ord) as

    (select substring(@String, number, 1),

    row_number() over (partition by substring(@String, number, 1) order by number),

    number

    from common.dbo.smallnumbers

    where number between 1 and len(@String))

    select @String2 =

    @String2 +

    case

    when [Char] = @Letter and [Pos] = @Occurance then @Replace

    else [Char]

    end

    from Chars

    order by ord

    select @String2

    As far as parsing out a string character-by-character, a Numbers table can do that just as easily and just as fast, and can deal with arbitrarily large strings. That's mentioned in the article, but I have to ask: Why have two different functions for the same thing? If you're going to have one based on a Numbers table, why also have one based on a CTE?

    For the phone number and other regex type functions, speed tests have been done by members of this site, and we found that a CLR regex out-performs this CTE method. Testing and discussion at:

    http://www.sqlservercentral.com/Forums/Topic470379-338-1.aspx

    The article is okay, but there are better solutions for each of these things.

    (In the Replace Nth CTE I built above, I reference my Common database and SmallNumbers table. I keep a database, called "Common" with a number of tables and functions that other databases reference. Much like a code library. The SmallNumbers table is 0 - 1,000.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Forgive if I am new to this but does that mean that inbuilt functions don't add to reported CPU ?

    CPU and IO for a UDF are NOT reported by setting statistics on.

    The solution that I use is to save the values from the sysprocesses table for the @@spid and then after the SQL runs, calculate the difference from sysprocesses.

    You will notice that there is a significant usage of CPU when a UDF is used.

    SQL = Scarcely Qualifies as a Language

  • Does anyone consider the use of a CLR function for this type of processing?

    For example:

    [Microsoft.SqlServer.Server.SqlFunction]

    public static int clr_occurs2(string target, char lookFor)

    {

    int occurs = 0;

    for (int i = 0; i < target.Length; i++)

    {

    if (target == lookFor)

    {

    occurs++;

    }

    }

    return occurs;

    }

    seems to be about 5 times faster than the CTE version.

    Greg

  • On CLR, yes, it will be faster than the CTE on that.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (8/11/2008)


    On CLR, yes, it will be faster than the CTE on that.

    Does anyone want to race? 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good article so far as the format, readability, and revelation go. The thing I'm having a problem with is the use of recursion of any kind to create a pseudo Tally table in the form of a CTE...

    Let's strip everything away except creating the count and have a race between the recursive method and just a flat out lookup... the @Bitbucket variable is used to have a place to "dump" the count to without having the IO system or the display system get involved...

    SET NOCOUNT ON

    DECLARE @Bitbucket INT

    DECLARE @Top INT

    SET @Top = 100000

    --=============================================================================

    -- Recursive CTE does the count

    --=============================================================================

    PRINT '========== Recursive CTE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH cteTally

    AS (

    SELECT 1 AS N

    UNION ALL

    SELECT N+1 FROM cteTally WHERE N < @Top

    )

    SELECT @Bitbucket = N

    FROM cteTally

    OPTION (MAXRECURSION 0)

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100)

    --=============================================================================

    -- ROW_NUMBER CTE does the count

    --=============================================================================

    PRINT '========== ROW_NUMBER CTE =========='

    SET STATISTICS IO ON

    SET STATISTICS TIME ON

    ;WITH cteTally

    AS (

    SELECT TOP(@Top) ROW_NUMBER() OVER(ORDER BY sc1.Type) AS N

    FROM Master.dbo.spt_Values sc1,

    Master.dbo.spt_Values sc2

    )

    SELECT @Bitbucket = N

    FROM cteTally

    SET STATISTICS TIME OFF

    SET STATISTICS IO OFF

    PRINT REPLICATE('=',100)

    On my humble 6 year old 1.8Ghz P5, that code returns the following results...

    [font="Courier New"]========== Recursive CTE ==========

    Table 'Worktable'. Scan count 2, logical reads 600001, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 3375 ms, elapsed time = 4132 ms.

    ====================================================================================================

    ========== ROW_NUMBER CTE ==========

    Table 'spt_values'. Scan count 2, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 63 ms, elapsed time = 67 ms.

    ====================================================================================================

    [/font]

    Based on that, I'd recommend not using the RBAR found in recursion for anything except the occasional hierarchy problem. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Wow, some of these threads live a longer life than the merits of their content would seem to dictate. I posted his article to show how a new feature could be used in an unconventional way, not to present a new paradigm for TSQL programming:Wow:

    I agree with Jeff. Every technique should be tested before implementation, and tested again before the next implementation...

    If there is an art to programming then this forum is our canvas!



    PeteK
    I have CDO. It's like OCD but all the letters are in alphabetical order... as they should be.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply