Subqueries, Temporary tables and CTEs

  • A reference to UDF's Jeff Moden's recent article on triangular joins and RBAR effects prompted me to revisit some tests I did a little while ago using UDFs. I've simplified the real world problem to investigate things, but now have a complete test sequence (including generating test data). I just don't fully understand what SQl server is doing.

    I started with

    select q.q_id, q.closed, q.dept

    from query q

    where q.closed > dbo.fnMonthEnd(200704,q.dept)

    and q.closed <= dbo.fnMonthEnd(200705,q.dept)

    order by q.q_id

    First attempt: move the filter to a subquery

    select q.q_id, q.closed, q.dept

    from query q

    join (

    select dept,

    dbo.fnMonthEnd(200704,dept) as 'MonthStart',

    dbo.fnMonthEnd(200705,dept) as 'MonthEnd'

    from dept

    ) p on q.dept=p.dept

    where q.closed > p.MonthStart

    and q.closed <= p.MonthEnd

    order by q.q_id

    For a test [query] table with 10,000 rows, this actually increased time from 27.5 seconds to 40 seconds.

    I then tried moving the subquery to a separate query loading a temporary table

    select dept,

    dbo.fnMonthEnd(200704,dept) as 'MonthStart',

    dbo.fnMonthEnd(200705,dept) as 'MonthEnd'

    into #ME

    from dept

    select q.q_id, q.closed, q.dept

    from query q

    join #ME p on q.dept=p.dept

    where q.closed > p.MonthStart

    and q.closed <= p.MonthEnd

    order by q.q_id

    drop table #ME

    This dramaticaly dropped the execution time to 60 milliseconds

    My next thought was to try a CTE

    ;with me (dept,monthstart,monthend) as (

    select dept,

    dbo.fnMonthEnd(200704,dept) as 'MonthStart',

    dbo.fnMonthEnd(200705,dept) as 'MonthEnd'

    from dept)

    select q.q_id, q.closed, q.dept

    from query q

    join me p on q.dept=p.dept

    where q.closed > p.MonthStart

    and q.closed <= p.MonthEnd

    order by q.q_id

    Suprisingly, this pushed the time back up to 40 seconds.

    Can anyone explain why the CTE works so differently from the temporary table and why SQL server doesn't optimise both the same way? I assume it's because it actually gets treated as the 2nd form, although I'd always assumed it was similar to the 3rd form, except that the temporary table was implicit.

    Regards

    Derek.

    P.S. the attached script includes 2 further tests, the fastest of which was to insert the expanded UDF call as a subquery.

    Derek

  • The optimizer treats a CTE almost exactly the same way it does a subquery, which is why you would get the same performance from a CTE and the subquery version.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • I'd say it's because it can, and probably did, put statistics on the columns in the temp table that you did with calculations in the other queries. That would make a huge difference. Then, eliminating the UDF's and using joins instead, turned it into a true set based operation. I suspect Jeff will drill down on all that in a lot more detail.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Derek Dongray (12/7/2007)


    Can anyone explain why the CTE works so differently from the temporary table and why SQL server doesn't optimise both the same way?

    A CTE is essentially a temporary view, not a temporary table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • As Gail stated, CTEs are nothing more than "in-line views".

    I also had a discussion about those not too long ago:-)

    http://www.sqlservercentral.com/Forums/Topic411895-338-1.aspx

    Best Regards,

    Chris Büttner

  • My question really relates to why the optimiser is so poor and identifying temporary results which could be calculated in advance.

    I'd always been told that with (quotes)"set-based programming"(end-quotes) one should indicate to the database engine what was needed and let the optimiser work out the best way to get the information. My example shows that this is definitely not the case, since the optimiser cannot seem to work out that an uncorrelated subquery can be extracted to a temporary table to achieve such an impressive performance improvement! If the optimiser really could work out the best way to evaluate a selection then surely it should have acheived the 60 millisecond response time on all four queries, since they all return the same result!

    The conclusion seems to be that I can't rely on the optimiser to identify any but the simplest optimisations and should look to promote subqueries, etc, manually wherever possible.Or am I missing something in the way it works?

    Is what the optimser can actually do documented anywhere?

    Basically, I am currently totally unimpressed with what the 'so-called' optimiser does for T-SQL and would hope it improves in future versions. Optimising compilers have been identifying and promoting invariant subexpressions from loops for decades, so why can't SQLserver's "optimiser" manage to do the equivalent of this simple operation?

    I haven't yet had a chance to try this test with Katmai.

    Regards

    Derek.

    Derek

  • A large part of how the optimizer gets the data out of the tables faster is statistics. When you're creating artificial tables, such as table valued functions, there are no statistics for the optimizer to use. Then it starts making guesses based on a row size of 1. Yeah, lots of times the rows are quite a bit larger than that so that guess doesn't work. It could guess a row size of 10,000, but then the opposite problem would arise. The simple fact is, the best way to work with the optimizer is to try to provide it as much clean information as you can. That means referring to tables that have statistics, best of all on good indexes.

    There's loads of documentation on this, but not very much in any one place. I'd recommend all four of the "Inside SQL Server 2005" books. There are also some excellent white papers at MS and some great MS blogs. Here are a few links where I go to get good information (in addition to people around here like Jeff, Gail and all the rest):

    http://blogs.msdn.com/craigfr/default.aspx

    http://www.microsoft.com/technet/prodtechnol/sql/2005/qrystats.mspx

    http://blogs.msdn.com/sqlqueryprocessing/default.aspx

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I skimmed the information of these sites and there is a lot of interesting information. Unfortunately, although there are discussions about how the optimiser uses statistics to choose between execution plans and how CTEs are processed, there doesn't seem to be anything which explains how the execution plans are generated and how (or if) the optimiser would ever produce a plan which effectively converts

    select a.col, d.val from a join (

    select b.col, dbo.udf(b.col) as 'val' from b

    ) d on a.col=d.col

    into

    select b.col, dbo.udf(b.col) as 'val' into #d from b;

    select a.col, #d.val from a join #d on a.col=#d.col

    which (as in my example) may be several hundred times faster!

    Basically, the question is "can the optimiser recognise an uncorrelated subquery (or CTE) and separate it out as query into an internal temporary table or is this one of the cases where it's down to the SQL developer to lead the optimiser and give explicit instructions as to how to evaluate the query because it will never recognise that the subquery (or CTE) is independent of the rest of the query"?

    Since UDFs must be deterministic, their presence should not affect this, unless there's some backdoor way to make a UDF non-deterministic in which case the above conversion would be unsafe and the optimiser would be correct not to move the subquery.

    Derek

  • Derek Dongray (12/10/2007)


    Basically, the question is "can the optimiser recognise an uncorrelated subquery (or CTE) and separate it out as query into an internal temporary table

    My guess is not. I've never seen the optimiser convert one sql statement into multiple and I don't believe it's possible for it to do that.

    Also, from personal experience, sometimes separating out a subquery is better, sometimes its worse.

    Theoretically, the optimiser could be enabled to do that, but in terms of complexity of the optimiser code, number of possible resultant plan and time take to optimise, this might be one of those areas where it's better that the developer lends a hand.

    As one of the SQL gurus[/url] mentioned at one conference. "The optimiser's job isn't to find the best plan. It's to find a good plan fast."

    The optimiser has limits imposed on it as to how long it may take to pick a plan. The more expensive the query, the longer it is allowed. If you look at the xml showplan, there's an entry that says whether all plans were considered, or if the optimiser bailed out early.

    Regarding UDFs, I noticed last week that getdate() is permitted in scalar UDFs in SQL 2005 SP2, so I'm not sure about the deterministic requirement anymore. The following works, without complaint, on 2005 SP2

    create function TestingDeterministic ()

    RETURNS DATETIME

    AS

    BEGIN

    RETURN GETDATE()

    END

    GO

    select dbo.TestingDeterministic()

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Check out the actual execution plans for #2 and #3a together .... they both come in at 50% relative cost (50% vs (42+8)%).

    The difference is in the compute scalar operation. When using the temp table, this operation is processed against 4 rows. When using the subquery, it is processed after the hash match against 10000 rows. The estimated cost for compute scalar is very small in both cases. The optimiser apparently doesn't take the content of the function into account when generating the plan, which probably makes sense.

  • GilaMonster (12/10/2007)


    My guess is not. I've never seen the optimiser convert one sql statement into multiple and I don't believe it's possible for it to do that.

    Actually I wouldn't expect it to actually split the query, just generate the plan as if it had. You're probably right though, that the cost of doing this on every query would be excessive, expecially in light of what you pointed out below.

    Regarding UDFs, I noticed last week that getdate() is permitted in scalar UDFs in SQL 2005 SP2, so I'm not sure about the deterministic requirement anymore.

    I upgraded from SQL 2000 to SQL 2005 earlier this year (about April) and the code this comes from has been in development for over a year. Once you pointed out that the determinism requirements have been relaxed I went looking in BOL and found

    User-defined functions have been expanded in Microsoft SQL Server 2005 to include functions written in any Microsoft .NET Framework supported programming language. In addition, some restrictions on using nondeterministic functions have been removed. These changes impact your decision on the type of user-defined function to design and implement.

    http://msdn2.microsoft.com/en-us/library/ms187440.aspx

    If UDFs can be non-deterministic then the optimiser has to assume that if you wrote a query which implicitly calls a UDF 10,000 times, then you actually want it to do this!

    In fact, back in July I wrote some test code in VB which actually managed to perform INSERTs into another table, although if you try to put one in T-SQL directly you still get

    Msg 443, Level 16, State 15, Procedure TestingDeterministic, Line 5

    Invalid use of side-effecting or time-dependent operator in 'INSERT' within a function.

    Obviously, the mention of 'time-dependent operator' here is wrong, as that seems to be allowed now.

    Regards

    Derek.

    Derek

  • If UDFs can be non-deterministic then the optimiser has to assume that if you wrote a query which implicitly calls a UDF 10,000 times, then you actually want it to do this!

    That's exactly right... in essence, you've made a correlated subquery. In fact, if you do an execution plan on test query 1, 2, or 3a, you'll see two execution plans... one for the UDF and one for the query. Not so with test 4... one nice execution plan with lot's of seeks and the only hash match in it is combining 2 sets of 4 rows. Compare that with the hash match of 10,000 rows in the others.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Great post, Derek... good example tests you attached.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply