Ok, as myths go, its a pretty weak one. In fact, it is true, this whitepaper explicitly states that. But hand in hand with that statement goes another one, “Table variables will always estimate to one row”. This is most definitely false, if there are no statistics then sql server can, at times, default to its ‘guessing’ of distribution of data based upon row counts. This behaviour can even further muddy the water of the old “Which is better, table variables or temp tables” argument.
To demonstrate this, firstly we need to populate a numbers table
create table numbers
(
Num integer primary key
)
go
insert into numbers
Select top(1000) ROW_NUMBER() over (order by (select null))
from sys.columns a cross join sys.columns b
Now we execute the following code
Declare @TableVar Table
(
ID integer not null primary key,
Mod10 integer not null
)
insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num
Select tv.Id,num
from @TableVar tv
join numbers
on tv.ID = num
and looking at the execution plan, we see :
drop table IDs
go
create table IDs
(
Id integer primary key,padding char(255)
)
go
insert into IDs(Id,padding)
Select top(1) num,'xxx'
from numbers
order by num
go
drop procedure TableVarTest
go
create procedure TableVarTest
as
declare @TableVar Table
(
ID integer not null,
Mod10 integer not null
)
insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num
select COUNT(*)
from AdventureWorks2008r2.dbo.Customer C
join AdventureWorks2008r2.dbo.CustomerOrders CO
on C.CustomerId = CO.CustomerId
Select tv.Id,IDs.id
from @TableVar tv
join IDs
on tv.ID = IDs.Id
where mod10 =0
go
insert into IDs(Id,padding)
Select top(1000) num,'xxx'
from numbers
where not exists(select id from IDs where id = num )
order by num
go
exec sp_recompile ids



Subscribe to this blog
Briefcase
Print
Posted by Jason Brimhall on 2 December 2011
Good info Dave.
Posted by Brent Ozar on 3 December 2011
"Well, in a way its nothing to do with table variables per se , but if you are joining to another table, then if (and when) that table has its stats updated then that will cause the statement to recompile and , surprise , surprise , you have a table variable with an estimate > 1."
If I understand you right, you're saying that table variables will have statistics when the plan is forced to recompile - either by hinting or by having tables with stats that need to be updated. However, once the stats update fires as this query runs, you won't get updated stats on the table variable each time that query runs afterwards - the update stats fired once, and that's done, and you're back to having the table variable estimate at 1 row, right?
Posted by Dave Ballantyne on 4 December 2011
Hi Brent,
No not quite, table variables dont have statistics, right ?:)
But, if a recompile happens to a statement (either forced or due to outdated stats etc on another table in the query), then the optimizer is able to 'see' the rowcount on sys.partition in tempdb for the table variable at that point. This does lead to the interesting question of why, that being the case, DOES the optimizer estimate at one row intially ? Another time for that one :) With this rowcount it will then 'guess' at the estimated rowcount if a predicate is used.
"once the stats update fires as this query runs". The advantage of a table var over a temp table is that as they have no stats there is no stats update to fire and therefore the plans are kept no matter how many updates are done to the table var. So, the subsequent calls of the proc will still find the plan in the cache, ie with the estimation of > 1 row until , there is another stats update on the 'real' table , or DDL etc..
Posted by Brent Ozar on 4 December 2011
Right, table variables *don't* have stats. In your example, say that the first query against the table variable gets accurate stats because one of the other tables happens to need updated stats. Now, run a subsequent query to double the number of rows in the table variable, and auto update stats won't fire against the table variable.
The only way (at least that I've ever seen, including your example) to get stats on a table variable is to force the query to be recompiled for some reason. Maybe it's by hinting, maybe it's by a freak occurrence with the other table needing a stats update, but that's outside of the table variable itself.
"The advantage of a table var over a temp table is that as they have no stats there is no stats update to fire and therefore the plans are kept no matter how many updates are done to the table var." - How on earth is that an advantage, getting inaccurate query plans every time?
Posted by Dave Ballantyne on 4 December 2011
"Now, run a subsequent query to double the number of rows" , yup , that will be right , the estimate of rows will not have changed over those two queries, but thats true of any cached plan , table var or no table var.
"maybe it's by a freak occurrence" , quite possibly freak :) but i have seen plans on production systems with an estimate > 1. If i were better with xquery i could probably find a few more , but other than satisfying curiosity not sure what it would prove :)
"How on earth is that an advantage.." , you have to remember why we have table vars in the first place : To avoid recompile issues when a temporary table has data changes which cause stats updates, which then invalidate plans. IMO , mostly these issue are avoided by using setbased rather than a procedural techniques, but there's always the other guy , right ;). As an aside, this 1 row estimate also explains the standard advice of only using table vars for a small amount of rows, because any joins involved with them will generate the plan based upon that 1 row estimate. So, the downside is poor estimation , but the upside is we dont have to recompile continually.