Comparing Table Variables with Temporary Tables

  • Great article, it's very thorough, meticulously clear descriptions and examples, I wish we could have more articles like this.

    It's also provoked some fine discussion.

    edit: originally pasted the wrong text into here!

    Tom

  • kajalchatterjee (4/9/2010)


    Comparing Table Variables with Temporary Tables

    I ran the following piece of code in SQL-Server 2005 and i can not see any entry of table variable in tempdb database. I can only see temp table entry in tempdb database.

    So it is not Myth that table variables are stored in Memory but it seems tobe tru. Please let me know if am i missing anything.

    You (or your code) appear to be missing a couple of spaces.

    I looked around in tempdb a bit in the dim and distant past to confirm for myself that table variables did show up there (using SQLS 2000), but I hadn't actually checked it for SQLS 2008 so I ran your code (with necessary spaces added). The results I got were:

    name

    --------------------------------------------------------------------------------------------------------------------------------

    #tempTables_________________________________________________________________________________________________________000000000011

    (1 row(s) affected)

    name

    --------------------------------------------------------------------------------------------------------------------------------

    #2E1BDC42

    (1 row(s) affected)

    No surprises. There's very clearly an extra temp table (called #2E1BDC42 on that particular run) in tempdb (and this is the table variable).

    If it doesn't work in your work system there's something really weird going on. I can't see any reason why it shouldn't. I even tried wrapping it in a transaction and trying all the different isolation levels in case that could have any effect (although I was very sure it couldn't) and it made no difference.

    Tom

  • Tom.Thomson (4/14/2010)


    RBarryYoung (6/15/2009)


    GilaMonster (6/15/2009)...No one I've spoken with knows exactly where the estimate is coming from. It's not the 30% of the table that's used with inequalities when a estimate can't be made. That would be 30 000 rows. Someone did point out that the estimates (which change as the total number of rows in the table variable changes) is approx (total rows)0.75, though I haven't done the calculations myself to verify that.

    In this case the exponent is closer to 0.886.

    No, 100000^0.886 is 26915, and 100000^0.75 is 5623. Since the total rows were 100000 and the extimate 5623, the exponent is much closer in this case to 0.75 than to 0.886.

    Arrrggh! Yep, you are correct Tom (and Gail), I used the actual result rowcount (17134) as the base instead of the total rowcount for the source table (100000).

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • Tom.Thomson (4/14/2010)


    Great article, it's very thorough, meticulously clear descriptions and examples, I wish we could have more articles like this.

    Thanks Tom! Nothing like a little pressure for any future articles... I sure hope I can live up to it!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Toby White (4/12/2010)


    Excellent article...

    Thanks Toby.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • While you may not be able to use SELECT <columnlist> INTO <table> with table variables it's possible to use:

    insert@table

    selectcol1, col2...

    from someTable

    wheresomeColumn = someValue;

    To circumvent the issue.

  • Great article. There is one thing I see though that is incorrect, the statement " This means that the execution plan for table variables will always treat the table as if it has 1 row. " is not true. I realize that SQL Server does not keep stats on indexes on table variables. But I have a counter-example to this claim.

    If you create a table and load it with let's say a million rows. Then create a table variable (doesn't matter if it has a primary key or unique constraint the result is the same) and load the million rows into the table variable. Let's say the permanent table is TableA and the table variable is @TableA. Then run the following query:

    SELECT COUNT(*) FROM (

    SELECT @TableA.* FROM @TableA INNER JOIN TableA on TableA.SomeCol = @TableA.SomeCol) allrows

    You will see in the actual execution plan that SQL Server knows the exact number of rows in the table variable. I found this to be true on SQL 2005 and 2008, and regardless of whether or not auto_create_statistics is on or off and auto_update_statistics is on or off. In some magical way SQL knows the number of rows at least in this case of the temp tables. It doesn't have stats I'm assuming, but it does know the # of rows. The estimated execution plan estimates 1 row for the table variable but the actual plan has the exact number of rows.

  • Michael H Lutz (4/17/2010)


    Great article. There is one thing I see though that is incorrect, the statement " This means that the execution plan for table variables will always treat the table as if it has 1 row. " is not true. I realize that SQL Server does not keep stats on indexes on table variables. But I have a counter-example to this claim.

    If you create a table and load it with let's say a million rows. Then create a table variable (doesn't matter if it has a primary key or unique constraint the result is the same) and load the million rows into the table variable. Let's say the permanent table is TableA and the table variable is @TableA. Then run the following query:

    SELECT COUNT(*) FROM (

    SELECT @TableA.* FROM @TableA INNER JOIN TableA on TableA.SomeCol = @TableA.SomeCol) allrows

    You will see in the actual execution plan that SQL Server knows the exact number of rows in the table variable. I found this to be true on SQL 2005 and 2008, and regardless of whether or not auto_create_statistics is on or off and auto_update_statistics is on or off. In some magical way SQL knows the number of rows at least in this case of the temp tables. It doesn't have stats I'm assuming, but it does know the # of rows. The estimated execution plan estimates 1 row for the table variable but the actual plan has the exact number of rows.

    Hi Micheal,

    No, that just means that when running the query, it is able to get the proper count of records. The optimizer still treats it as only having one record.

    Let me demonstrate that. This code builds a one-million row tally table into a table variable:

    declare @test-2 table (N int PRIMARY KEY CLUSTERED);

    ;WITH

    TENS (N) AS (SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL

    SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0 UNION ALL SELECT 0),

    THOUSANDS (N) AS (SELECT 1 FROM TENS t1 CROSS JOIN TENS t2 CROSS JOIN TENS t3),

    MILLIONS (N) AS (SELECT 1 FROM THOUSANDS t1 CROSS JOIN THOUSANDS t2),

    TALLY (N) AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) FROM MILLIONS)

    INSERT INTO @test-2 (N)

    SELECT N from TALLY ORDER BY N

    select count(*) from @test-2

    This produces this execution plan:

    I've circled the two parts of interest... see how the "Estimated Number of Rows" is one? That's the optimizer treating the table variable as one row. Since the optimizer thinks that the table variable only has one row, the plan might not be the best for the query.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Michael H Lutz (4/17/2010)


    ..There is one thing I see though that is incorrect, the statement " This means that the execution plan for table variables will always treat the table as if it has 1 row. " is not true.

    I agree, and mentioned this earlier - though in different circumstances. Re-using Wayne's code:

    select count(*) from @test-2 option (recompile);

    Produces the following actual execution plan (note the estimated row count and parallel plan):

  • Quote

    "When you start joining the table variable to other tables, then the cardinality error frequently results in a bad execution plan, which gets worse the more complex the query is, and the larger the row count is."

    My comment on the performance with table variable and temporary table

    The optimizer of SQL SERVER probably treats a table variable as a tiny table, therefore gives a loop join in the execution plan. This is the primary factor that causes table variable solution is at least dozens time slower than a temporary table solution under certain condition assuming that both solutions have same data schema and indexes. In other word, we can force optimizer to choose the appropriate join and have similar performance results.

    The trick is the performance difference between loop, hash and merge join.

    Feng Zheng

    zhengfengchina@yahoo.com

  • Quote

    "When you start joining the table variable to other tables, then the cardinality error frequently results in a bad execution plan, which gets worse the more complex the query is, and the larger the row count is."

    My comment on the performance with table variable and temporary table

    The optimizer of SQL SERVER probably treats a table variable as a tiny table, therefore gives a loop join in the execution plan. This is the primary factor that causes table variable solution is at least dozens time slower than a temporary table solution under certain conditions assuming that both solutions have same data schema and indexes. In other word, we can force the optimizer to choose the appropriate join and have similar performance results.

    The trick is the performance difference between loop, hash and merge join.

    Feng Zheng

    zhengfengchina@yahoo.com

  • Hi Wayne!

    Yes I see that now. My apologies for bad information, I was thinking about it afterward too and yes I see the difference. I was looking at actual rows instead of the optimizer's estimate of number of rows. Thanks for clarifying this clearly and finding a counter example!

    Best Regards,

    Mike

  • yup makes sense. This is so interesting, .. you really need to be aware of and careful with this.

  • Excellent article. Very comprehensive and informative.

    Will be very useful in future to have comprehensive comparison data side by side in one place.

  • I had been using temp tables in a large import process and made the change to table variables and saw a big improvement in performance. However, the server was later moved to a virtual server (VMWare) and the processes ran much slower (x40 slower). I reverted back to temp tables and the process improved again (though not to the original speed on physical server using table variables)

    I think we need more discussion when looking at how different methods interact with disk / memory, as to how they compare across both physical and virtual environments. In my case what is good for physical was bad for virtual and vice versa. Or perhaps there are things which the VM administrators need to do to overcome this problem.

Viewing 15 posts - 136 through 150 (of 163 total)

You must be logged in to reply to this topic. Login to reply