Which query method is best to use and when?

  • Hi All,

    I am attempting to provide some guidelines to the report writers for one of my clients. They do have their own Sandbox db area but as they have read only access to the DW they like to make heavy use of TempDB wth use of very large temp tables. Something which is annoying the production DBAs.

    These report writers are pretty much self taught and I want to try and clarify when to (and not to use temp tables) along with alternatives and their advantages/disadvantages.

    I would appreciate it if you could give me some constructive feedback on what you think of the following?

    These are pretty much guidlines for them to use and not a one size fits all approach!

    Apologies for the very large jpg! Note that the bottom right should stay "Global Temp Table".

    Thanks

  • Take a look at this article[/url]. I think you will find all the information you need.

    There are two assumptions that I would revise:

    1) 100 rows is the edge value for choosing between temp table and table variable: this is clearly explained in the article I linked, so I won't go into details. Be sure to take the time to read it.

    2) Temporary table / table variable is not a valid solution when table is used more than once in the query: it is, indeed. CTEs and subqueries could be not enough efficient especially when you have to access data more than once, so that using table variables and temp tables becomes legitimate.

    Hope this helps

    Gianluca

    -- Gianluca Sartori

  • It's simply not a given that Table Variables are faster than Temp Tables and Table Variables are a whole lot more difficult to troubleshoot because their content don't persist even in SSMS like Temp Tables do. It's also not true that Table Variables are "memory only" or that Temp Tables are "disk only". Both structures start out in memory and both "spill over" onto TempDB disk space if they're too big for memory.

    It IS, however, true that Table Variables do not and cannot be made to use any form of statistics nor anything other than a Primary Key. AND, it is true that requirements of a report can easily change with data.

    So.... my recommendation is that you generally avoid Table Variables... to me, they're just not worth the occasional, sporatic slight speed advantage that a lot of folks talk about. The only place I use Table Variables in any of my code is in User Defined Functions. If I could use Temp Tables in UDF's, it's very likely that I'd never use a Table Variable.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gahh.... I'd also recommend that if you need to reference data more than once from a subquery that you NOT use CTE's... Like Views, if you reference a CTE more than once in the query, the CTE will execute more than once and it can make for some very slow code.

    My recommendation is that if you need to reference data more than once in a query, don't use a CTE for it. Use a Temp Table instead.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Holy moly... Guess I needed to read the entire graphic and the underlying text, huh?

    Again, IMHO, using Global Temporary Tables is a form of "Death by SQL" because if two instances of code that try to build a GTT try to run at the same time and don't have a pot wad of checking code and data separation code, BOOM!

    I guess I pretty much disagree with the entire graphic you've displayed especially where performance and concurrent runs are concerned.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Number of rows isn't the only issue that can kill you with Table Variables. The actual VALUES can cause issue too, since you can't have statistics on TVs. Thus the engine cannot know that your table variable with 6 rows in it is going to hit 80% of the 5M row table you are about to join to due to data skew...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Ok great. I thought that might promote some discussion!

    So lets start with Table Variables. Basicially never use these, regardless of the row size or application (unless it's a UDF). Lack of stats/indexes (apart from the one created for the PK or course!)

    Next CTEs. Again these can be forgotten about and never used....unless you need to use a recursive CTE??

    Ok so Temp Tables seem to be the tool of choice. But what about using derived tables? I thought that the performance generally would be quicker than using a temp table as long as the indexes you require already exist. One downfall is that in a complex query it can quickly become unreadable.

    And Global Temp Tables - NEVER EVER USE. So how about the data needs to be shared between connections and you have no write permissions on the database. Is this your only option??

  • aaa-322853 (4/26/2010)


    Ok great. I thought that might promote some discussion!

    So lets start with Table Variables. Basicially never use these, regardless of the row size or application (unless it's a UDF). Lack of stats/indexes (apart from the one created for the PK or course!)

    A lot of good folks on these forums will take exception to the word "never" in this case but I'm not one of them. Even in the instances where Table Variables may cause a bit of extra speed, I avoid them because of the hoops you have to jump through to troubleshoot a proc that uses them.

    Next CTEs. Again these can be forgotten about and never used....unless you need to use a recursive CTE??

    Ah... no. CTE's are absolutely wonderful for building "top down", highly readable, highly effective code. I just wouldn't build a query that calls a CTE within it more than once for the reason previously stated in my previous post.

    I will say that I avoid RBAR based recursive CTE's which are nothing more than "hidden" cursors. There are actually some set based recursive CTE's in the world, but that's normally not the case. Far better to learn to use a Tally table and some of the other "SQL Black Arts" than to use RBAR recursive CTEs/.

    Ok so Temp Tables seem to be the tool of choice. But what about using derived tables? I thought that the performance generally would be quicker than using a temp table as long as the indexes you require already exist. One downfall is that in a complex query it can quickly become unreadable.

    Careful.... I didn't say that Temp Tables are the tool of chice to replace all CTE's or their close cousins, Derived Tables. I said that I'd use a Temp Table to replace a CTE IF the CTE needed to be called more than once.

    And Global Temp Tables - NEVER EVER USE. So how about the data needs to be shared between connections and you have no write permissions on the database. Is this your only option??

    I guess my response would be that you shouldn't write code that relies on multiple disparate connections. For example, if you need a Temp Table in a proc and you also need some dynamic SQL to build the report (a common occurance), there won't be any problems if you create the Temp Table in the proc and reference that Temp Table in the dynamic SQL. You can also call stored procedures that reference the Temp Table although that will certainly make for some troubleshooting headaches.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for this Jeff, I am enjoying this interesting discussion.

    Ah... no. CTE's are absolutely wonderful for building "top down", highly readable, highly effective code. I just wouldn't build a query that calls a CTE within it more than once for the reason previously stated in my previous post.

    I certainly agree with you. One of my points when I cover CTEs will be how much more readable it makes your code over using several derived tables.

    I will say that I avoid RBAR based recursive CTE's which are nothing more than "hidden" cursors.

    Now, that's interesting. I knew they didn't scale very well but didn't know they working in the same(similar) way as a cursor. Valid point!

    So in essence, I am looking to 'help' these guys out by providing guidelines. I don't want to blind them with science, but now you mention it. It may well be worth mentioning the benefits of how a tally table would help them.

  • aaa-322853 (4/26/2010)


    Thanks for this Jeff, I am enjoying this interesting discussion.

    Ah... no. CTE's are absolutely wonderful for building "top down", highly readable, highly effective code. I just wouldn't build a query that calls a CTE within it more than once for the reason previously stated in my previous post.

    I certainly agree with you. One of my points when I cover CTEs will be how much more readable it makes your code over using several derived tables.

    I will say that I avoid RBAR based recursive CTE's which are nothing more than "hidden" cursors.

    Now, that's interesting. I knew they didn't scale very well but didn't know they working in the same(similar) way as a cursor. Valid point!

    So in essence, I am looking to 'help' these guys out by providing guidelines. I don't want to blind them with science, but now you mention it. It may well be worth mentioning the benefits of how a tally table would help them.

    It seems that a couple of articles on what a Tally Table is and how to do cross tabs would fit the bill. A couple of short lessons on seemingly silly things like "Hot to count in T-SQL" would also be a benefit. But, let me ask... do these folks really need such guidance as a decision tree to help them write good reporting code?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No, I've dumped that idea after our discussion. My approach is more to give them the pros and cons of each method allowing them to make informed decisions. If they have a good idea of what query method to use and why then the objective for this session is complete.

    Hot to count in T-SQL

    Can you elaborate plz?

  • aaa-322853 (4/26/2010)


    No, I've dumped that idea after our discussion. My approach is more to give them the pros and cons of each method allowing them to make informed decisions. If they have a good idea of what query method to use and why then the objective for this session is complete.

    Hot to count in T-SQL

    Can you elaborate plz?

    Apologies for the phat phingers.... I meant "How to count in T-SQL". It's one of the basic principles of all programming but it's quite a bit different in a declarative language like T-SQL than in procedural languages like C# or Java. I frequently use the simple act of counting to show folks what a "pseudo cursor" is and how to "think in columns instead of rows".

    For example, how would you or folks in your group count from 1 to, say, 8,000. If someone else on this thread answers, try not to look at their answers... figure it out for yourself first and then come back with your findings.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think I get it. Some people will use a loop and iterate through each number ending up with 8,000 result sets.

    If you use a tally table you can return the 1 to 8,000 using one select, in a set based way. Something like;

    SELECT n

    FROM Nums

    WHERE n <= 8000

    ORDER BY n

  • aaa-322853 (4/25/2010)


    I am attempting to provide some guidelines to the report writers for one of my clients. They do have their own Sandbox db area but as they have read only access to the DW they like to make heavy use of TempDB wth use of very large temp tables. Something which is annoying the production DBAs.

    There are a couple of things here which interest me.

    1. Why do production DBAs care about tempdb usage in the Data Warehouse? Aren't these two physically separate environments? I'm sure there is a good reason for the concern - I just wonder what it is.

    2, I am a great believer in making life as easy as possible for report writers. A good environment makes extensive use of pre-aggregations. The fact that huge intermediate result sets are being generated in tempdb suggests that a layer of abstraction is missing here.

    Indexed views and Analysis Services are powerful tools to make report writing more efficient for everyone. One possible data warehouse design might use SSIS, SSRS, and SSAS to provide a consistent view of aggregated data in an easily consumable way.

    Alternatively, there is no shortage of third-party products.

  • THeSqlGuru(4/26/2010)


    Number of rows isn't the only issue that can kill you with Table Variables. The actual VALUES can cause issue too, since you can't have statistics on TVs. Thus the engine cannot know that your table variable with 6 rows in it is going to hit 80% of the 5M row table you are about to join to due to data skew...

    thats 100% true , both use memory and may shift to tempdb whenver have heavy data .but one thing i dont understand is, WHY TABLE variable doesnt have statistics ?

    i dont see any difference between temp table and table variable except their "creation syntax" ?

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Viewing 15 posts - 1 through 15 (of 20 total)

You must be logged in to reply to this topic. Login to reply