Is a Temporary Table Really Necessary?

  • JohnG (5/30/2008)


    Well, somebody decided to test that theory to prove it or not and found that it can improve performance. See http://msdn.microsoft.com/en-us/library/aa175774(SQL.80).aspx

    And before you discount the article, you may wish to take a serious look at the author's resumé.

    Yep happy to eat my words then. One thing i've definitely learned today is that you do have the ability to create a clustered index on multiple columns on a table variable. Not sure if you can do it without the unique constraint though?

    Basically I just tested a temp table with 1024 rows inserting into another temp table. When ordering by two columns (which i deliberately inserted such that the entire table would have to be reversed) the query came out at 55 seconds over 10,000 iterations. Using the unique clustered index as tom has comes out at 35 seconds.

    Interestingly enough, doing the same, but with 1 column .v. clustered primary key, took 28 seconds on the ordering version and 32 on the clustered primary key version.

    So it's still back to the same old story of test test and test again. To be honest though, given that you should stick to temp tables for larger data sets, and that you're really not very likely to be inserting the same set of data into a table variable 10,000 times, i'll stick to the nuclear meltdown analogy, but i'll enlarge the size of the bucket 🙂

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Matt Whitfield (5/30/2008)


    2) Derived tables avoid using tempdb

    Again not necessarily true. Derived tables quite often end up as a system generated temporary table in tempdb, this is done by the query processor if it decides it needs to (although i've never looked into it enough to find out why).

    Was just going to mention this. Also, even in cases such as IN where ou have a list of items such as 1,2,3,4...10000 I have seen the query engine create a temp table and replace as an inner join. Couldn't find an exact point as to when it does this but noted on multiple occasion that adding one more item to my IN criteria resulted in a drastic decrease in time for the results and found this as the reason.

    But to a point mentioned earlier about orphaned temp tables, they should only occurr if the connection that created a # (local temp table) is still open check to make sure your app is closing the connection. In the case of ## (global temp tables) if any connection is open to the server it will remain there until all connections close or it is destroyed. Either way thou I do this in my code

    If object_id('tempdb..#x') IS NOT NULL

    drop table #x

    I will usually perform at the start and the end of the process to be sure if I reuse the name later on with the same connection I don't accidentally end up with an error or the wrong set of data.

    If in TempDB multiple copies of #x exists only the one for the connection is acutally going to cause a non-null value.

  • JohnG (5/30/2008)


    PK - can't see it making much difference at all.

    Again, a typical "off the cuff" remark without thinking. I'm not saying ALL the time. One thing that a PK gives you is duplicate key detection (constraint violation). Yes, even for a table variable! You use it when the logic dictates its use.

    DECLARE @v_temp_table TABLE (foobar int PRIMARY KEY CLUSTERED NOT NULL)

    insert into @v_temp_table values (1);

    insert into @v_temp_table values (1);

    [font="Courier New"]Msg 2627, Level 14, State 1, Line 3

    Violation of PRIMARY KEY constraint 'PK__#02FEA01F__03F2C458'. Cannot insert duplicate key in object 'dbo.@v_temp_table'.[/font]

    Don't forget that settting a primary key is also the only way you have to put an index (clustered or not) on a table var. I'm not a huge fan of the table vars because of their screwy handling by the optimizer (who can't seem to see the true cardinality of a table var, and then "guesses"), but I still use them on really small things....

    Edit: I see that's been mentioned already. Okay - so I'm a little slow this morning:)

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • i personally try to avoid temp tables

    i've seen people populate them with tens of thousands of rows and in many cases most of a source table they are selecting from. If I can i try to put logic in the query to avoid the temp table. in one case we have a database that is really supposed to be in LDAP, but we use it in SQL for business rules for an app. Users use a sp to access it and in some cases it returns over 100,000 rows and takes too long. I wrote a CTE to cut down on the time to run it. in other cases i like to use table variables to make sure the temp table is gone after it's used

  • Matt Miller (5/30/2008)


    Don't forget that settting a primary key is also the only way you have to put an index (clustered or not) on a table var. I'm not a huge fan of the table vars because of their screwy handling by the optimizer (who can't seem to see the true cardinality of a table var, and then "guesses"), but I still use them on really small things....

    The article that JohnG posted shows you can actually... have a read it's quite interesting...

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • You know looking at it directly you cannot see a table variable because of the scope but if you look at it this way

    declare @t table

    (OrderID int,

    RequiredDate datetime not null,

    ShippedDate datetime null)

    -- declare @t table

    -- (OrderID int primary key,

    -- RequiredDate datetime not null,

    -- ShippedDate datetime null,

    -- unique (RequiredDate, OrderID))

    --declare @a int

    insert into @t select OrderID, RequiredDate, ShippedDate from Orders

    select * from tempdb..sysobjects where xtype != 's'

    go

    select * from tempdb..sysobjects where xtype != 's'

    go

    you can see quite clearly that the table variable is still created in tempdb regardless. And if you create a Primary Key or Unique constraint they too are going to be defined in tempdb. So regardless of table variable or temp table you still would be impacted by events in tempdb. Would have to research the under the hood aspect of table variable management to see what it does compared to temp tables that may offer it a better performance and what the break point is between use of either.

    My big concern with table variables would be large sets of data depending on what it may be doing in the cache when you consider how the cache handles reuse of information. If a table variable is managed more in memory than on the tempdb tabl you run the risk of pushing cahced information out and having to reread those objects if they are used often enough they stay in memory. Just a curiosity I think someone should explore. It is great that table variables can potnetially save time and effort especially when testing in an environment with a minimal load, but what if any impact does it have to environments under heavy loads. That is what intrigues me most.

  • Matt Whitfield (5/30/2008)


    Matt Miller (5/30/2008)


    Don't forget that settting a primary key is also the only way you have to put an index (clustered or not) on a table var. I'm not a huge fan of the table vars because of their screwy handling by the optimizer (who can't seem to see the true cardinality of a table var, and then "guesses"), but I still use them on really small things....

    The article that JohnG posted shows you can actually... have a read it's quite interesting...

    Yup I see that. It goes along with what I was saying about back-dooring an index with the Primary Key, but I hadn't thought of the unique constraint to backdoor in another index...nice!

    It still doesn't fix the cardinality/statistics issue, which may still lead to bad exec plans....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • OK decided to take a look at TempDB to see what happens in the log

    (DBCC LOG 'tempdb',3)

    and found that regardless of table variable or temp table both are logged to the tempdb log. In fact I found this to be the case when I inserted only 1 row in the table variable. So tempdb logging will be a factor in the speed of the table variable it seems. Just as an FYI.

  • SQL Noob (5/30/2008)


    i personally try to avoid temp tables

    i've seen people populate them with tens of thousands of rows and in many cases most of a source table they are selecting from. If I can i try to put logic in the query to avoid the temp table. in one case we have a database that is really supposed to be in LDAP, but we use it in SQL for business rules for an app. Users use a sp to access it and in some cases it returns over 100,000 rows and takes too long. I wrote a CTE to cut down on the time to run it. in other cases i like to use table variables to make sure the temp table is gone after it's used

    CTEs are greate if you have >= SQL 05. And table variables are great if you don't have to pass the output to another SP. 🙂 However, a table variable CAN be used in a UDF to pass output to another SP. Thus, essentially, a table variable CAN be used as input to an SP.

    But correct me if I'm wrong, doesn't a Table Variable get stored in ram rather than the tempdb? You will want to consult your DBA before writting a lot of these. You could just be transfering work load from one area to antoher without the hardware to support it.

    Again, this gets back to a all things in thier proper place and time. Temp tables have thier uses for sure.

  • Another way I've found to further increase performance when joining an indexed temp table is to run UPDATE STATISTICS on the temp table. I wouldn't think this would be necessary but it helps when the temp table being joined has hundreds of thousands or millions of rows.

    One downside to this approach was that errors were being thrown (on SQL 2000) if the update statistics statement was in a procedure nested several levels deep. It worked fine when running the stored procedure directly.

    I've also found that temp tables are not always automatically dropped after the connection ends or the stored procedure goes out of scope. See http://www.sqlservercentral.com/Forums/Topic488347-149-2.aspx

    Erik

  • You are absolutely right!!! I found my self in a situation a few weeks ago where a stored procedure took over 20 minutes to run, i modified the stored procedure using temporary tables for the where clauses and the performance was increased by much!!!!

    Thanks for sharing!!!

    Daniel Hernández.

  • One thing I love temp tables for is when I need to do some extensive processing on a large collection of rows. I want transactional consistency on the original data but I don't want to hold locks and block writers while all of my query executes. If I just select the data into a temp table first then I know I have a consistent set of data to start with, and I can process away without fear of blocking writers to the original table.

  • Expirementing a little more found this.

    Look at syscacheobjects in master I discovered that whichever item is ran first temp table or table variable or table variable with primary key for the same code a adhoc execution plan is stored. 99% of the time when I ran it the other way thereafter it would not create a new execution plan nor would it reuse the stored one. However if the code matched it would reuse the stored plan. If this is accurate then you have to make sure you run DBCC FREEPROCCACHE to clear this for the other to correctly measure performance.

    Also found that in any instance a "Prepared" execution plan is created and stored for

    (@1 varchar(30))SELECT * FROM [tempdb]..[sysobjects] WHERE [xtype]<>@1

    and all scenarios will reuse this plan and fire it twice during the run of the code I was testing with (see previous post).

    Will post any other findings I come up with as I continue testing.

  • Sorry you also have to run DBCC DROPCLEANBUFFERS for the execution plan to actually get regenerated.

  • I wonder if "Common Table Expressions" might also be an option here.

    You could create two CTE's that match your temp table creation and then join them together as you indicated.

    I have not tried it to see what the performance hit would be, but it would be a good thing to try out.

    regards all

    Matteo

Viewing 15 posts - 61 through 75 (of 93 total)

You must be logged in to reply to this topic. Login to reply