Blog Post

MythBusting–“Table variables have no statistics”

,

Ok, as myths go, its a pretty weak one.  In fact, it is true, this whitepaper explicitly states that.  But hand in hand with that statement goes another one, “Table variables will always estimate to one row”.  This is most definitely false,  if there are no statistics then sql server can, at times, default to its ‘guessing’ of distribution of data based upon row counts.  This behaviour can even further muddy the water of the old “Which is better, table variables or temp tables” argument.

To demonstrate this, firstly we need to populate a numbers table

create table numbers
(
Num integer primary key
)
go
insert into numbers
Select top(1000) ROW_NUMBER() over (order by (select null))
from sys.columns a cross join sys.columns b

Now we execute the following code

Declare @TableVar Table
(
ID integer not null primary key,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

Select tv.Id,num
from @TableVar tv
join numbers
on tv.ID = num

and looking at the execution plan, we see :
image
 
1 Row estimated and 20 rows actual, as you may well expect.  Now add ‘OPTION(RECOMPILE)’  the plan is now different.
 
image
 
Look at that an accurate row estimation.  How about if we are filter to the statement say ‘Mod10=0’
 

image
 
Another different but wrong estimation.  This is because table variables dont have statistics, but we do have row counts.  It is worth pointing out at this point that these are the same numbers you will get if you did these operations on a normal ‘permanent’ table, but had turned off AUTO STATISTICS.
 
Obviously in a production environment, you would only be using RECOMPILE in ‘special’ circumstances, right ?  So, this isn't an issue.  All your table variables will be estimating as one row.  Wrong,  I would be willing to bet that a surprisingly high number are estimating as something else.  If you are so inclined, so can probably find quite a few in the dmv sys.dm_exec_query_plan.  So, how does this happen ? Well,  in a way its nothing to do with table variables per se , but if you are joining to another table, then if (and when) that table has its stats updated then that will cause the statement to recompile and , surprise , surprise , you have a table variable with an estimate > 1.
 
OK…  So lets step through that.  Ignore the extra Select statement that counts from adventureworks,  its just there to create a more ‘complicated’ stored procedure and we get multiple statements cached in the plan.
 
drop table IDs
go
create table IDs
(
Id integer primary key,padding char(255)
)
go
insert into IDs(Id,padding)
Select top(1) num,'xxx'
from numbers
order by num
go
drop procedure TableVarTest
go
create procedure TableVarTest
as
declare @TableVar Table
(
ID integer not null,
Mod10 integer not null
)

insert into @TableVar(ID,Mod10)
Select top(20) num,num%10
from numbers
order by num

select COUNT(*)
from AdventureWorks2008r2.dbo.Customer C
join AdventureWorks2008r2.dbo.CustomerOrders CO
on C.CustomerId = CO.CustomerId

Select tv.Id,IDs.id
from @TableVar tv
join IDs
on tv.ID = IDs.Id
where mod10 =0
go
On first execution the join of the table variable to IDs produces…
 
image
 
Now, lets add some more data to ID’s and force a recompile just for good measure :
 
insert into IDs(Id,padding)
Select top(1000) num,'xxx'
from numbers
where not exists(select id from IDs where id = num )
order by num
go
exec sp_recompile ids

and then re-execute the stored procedure

 
image
 
So, one myth busted and one proved, not bad for one blog.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating