Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

MythBusting–“Table variables have no statistics”

Ok, as myths go, its a pretty weak one.  In fact, it is true, this whitepaper explicitly states that.  But hand in hand with that statement goes another one, “Table variables will always estimate to one row”.  This is most definitely false,  if there are no statistics then sql server can, at times, default to its ‘guessing’ of distribution of data based upon row counts.  This behaviour can even further muddy the water of the old “Which is better, table variables or temp tables” argument.

To demonstrate this, firstly we need to populate a numbers table

create table numbers
(
Num integer primary key
)
go
insert into numbers
Select top(1000) ROW_NUMBER() over (order by (select null))
from sys.columns a cross join sys.columns b

Now we execute the following code

Declare @TableVar Table
(
ID integer not null primary key,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

Select tv.Id,num
from @TableVar tv
join numbers
on tv.ID = num


and looking at the execution plan, we see :
image
 
1 Row estimated and 20 rows actual, as you may well expect.  Now add ‘OPTION(RECOMPILE)’  the plan is now different.
 
image
 
Look at that an accurate row estimation.  How about if we are filter to the statement say ‘Mod10=0’
 

image
 
Another different but wrong estimation.  This is because table variables dont have statistics, but we do have row counts.  It is worth pointing out at this point that these are the same numbers you will get if you did these operations on a normal ‘permanent’ table, but had turned off AUTO STATISTICS.
 
Obviously in a production environment, you would only be using RECOMPILE in ‘special’ circumstances, right ?  So, this isn't an issue.  All your table variables will be estimating as one row.  Wrong,  I would be willing to bet that a surprisingly high number are estimating as something else.  If you are so inclined, so can probably find quite a few in the dmv sys.dm_exec_query_plan.  So, how does this happen ? Well,  in a way its nothing to do with table variables per se , but if you are joining to another table, then if (and when) that table has its stats updated then that will cause the statement to recompile and , surprise , surprise , you have a table variable with an estimate > 1.
 
OK…  So lets step through that.  Ignore the extra Select statement that counts from adventureworks,  its just there to create a more ‘complicated’ stored procedure and we get multiple statements cached in the plan.
 
drop table IDs
go
create table IDs
(
Id integer primary key,padding char(255)
)
go
insert into IDs(Id,padding)
Select top(1) num,'xxx'
from numbers
order by num
go
drop procedure TableVarTest
go
create procedure TableVarTest
as
declare @TableVar Table
(
ID integer not null,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

select COUNT(*)
from AdventureWorks2008r2.dbo.Customer C
join AdventureWorks2008r2.dbo.CustomerOrders CO
on C.CustomerId = CO.CustomerId

Select tv.Id,IDs.id
from @TableVar tv
join IDs
on tv.ID = IDs.Id
where mod10 =0
go

On first execution the join of the table variable to IDs produces…
 
image
 
Now, lets add some more data to ID’s and force a recompile just for good measure :
 
insert into IDs(Id,padding)
Select top(1000) num,'xxx'
from numbers
where not exists(select id from IDs where id = num )
order by num
go
exec sp_recompile ids

and then re-execute the stored procedure
 
image
 
So, one myth busted and one proved, not bad for one blog.

Comments

Posted by Jason Brimhall on 2 December 2011

Good info Dave.

Posted by Brent Ozar on 3 December 2011

"Well,  in a way its nothing to do with table variables per se , but if you are joining to another table, then if (and when) that table has its stats updated then that will cause the statement to recompile and , surprise , surprise , you have a table variable with an estimate > 1."

If I understand you right, you're saying that table variables will have statistics when the plan is forced to recompile - either by hinting or by having tables with stats that need to be updated.  However, once the stats update fires as this query runs, you won't get updated stats on the table variable each time that query runs afterwards - the update stats fired once, and that's done, and you're back to having the table variable estimate at 1 row, right?

Posted by Dave Ballantyne on 4 December 2011

Hi Brent,

No not quite,  table variables dont have statistics,  right ?:)

But,  if a recompile happens to a statement (either forced or due to outdated stats etc on another table in the query), then the optimizer is able to 'see' the rowcount on sys.partition in tempdb for the table variable at that point. This does lead to the interesting question of why, that being the case, DOES the optimizer estimate at one row intially ? Another time for that one :)  With this rowcount it will then 'guess' at the estimated rowcount if a predicate is used.

"once the stats update fires as this query runs".  The advantage of a table var over a temp table is that as they have no stats there is no stats update to fire and therefore the plans are kept no matter how many updates are done to the table var.  So, the subsequent calls of the proc will still find the plan in the cache, ie with the estimation of > 1 row until , there is another stats update on the 'real' table , or DDL etc..

Posted by Brent Ozar on 4 December 2011

Right, table variables *don't* have stats.  In your example, say that the first query against the table variable gets accurate stats because one of the other tables happens to need updated stats.  Now, run a subsequent query to double the number of rows in the table variable, and auto update stats won't fire against the table variable.

The only way (at least that I've ever seen, including your example) to get stats on a table variable is to force the query to be recompiled for some reason.  Maybe it's by hinting, maybe it's by a freak occurrence with the other table needing a stats update, but that's outside of the table variable itself.

"The advantage of a table var over a temp table is that as they have no stats there is no stats update to fire and therefore the plans are kept no matter how many updates are done to the table var."  - How on earth is that an advantage, getting inaccurate query plans every time?

Posted by Dave Ballantyne on 4 December 2011

"Now, run a subsequent query to double the number of rows" , yup , that will be right , the estimate of rows will not have changed over those two queries, but thats true of any cached plan , table var or no table var.

"maybe it's by a freak occurrence" , quite possibly freak :)  but i have seen plans on production systems with an estimate > 1.  If i were better with xquery i could probably find a few more , but other than satisfying curiosity not sure what it would prove :)

"How on earth is that an advantage.." , you have to remember why we have table vars in the first place :  To avoid recompile issues when a temporary table has data changes which cause stats updates,  which then invalidate plans.  IMO , mostly these issue are avoided by using setbased rather than a procedural techniques, but there's always the other guy , right ;). As an aside, this 1 row estimate also explains the standard advice of only using table vars for a small amount of rows, because any joins involved with them will generate the plan based upon that 1 row estimate.  So, the downside is poor estimation , but the upside is we dont have to recompile continually.  

Leave a Comment

Please register or log in to leave a comment.