Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

xml in subselect and performance => alternatives? Expand / Collapse
Author
Message
Posted Monday, December 3, 2012 6:12 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
Against small datasets, Nested Loop Joins are often the most efficient. SQL Server pretty much assumes table variables will have 1 row in them. Since it doesn't maintain stats on the contents of table variables, it's pretty much stuck with that. So, table variables often cause a Nested Loops Join when they interact with other tables. That's usually okay if they only have a few rows in them, but can seriously tank performance if they have lots of rows.

That's why I asked about converting to a temp table. Those maintain stats, which gives the query optimizer a better chance of picking a more efficient join method (Hash or Merge) on larger datasets.

Table variables are a notorious performance-killer. They can be very useful if you need to persist data changes past a rollback (since they aren't subject to most rollback operations), or if you know you're going to be dealing only with very small datasets, or if you need to use one as an input parameter (SQL 2008 and beyond). If you go past those limited uses, you're usually much better off using a temp table.

Since you can't use one, you may not be able to do much about tuning this query. You may be stuck with leaving it as-is.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1391897
Posted Monday, December 3, 2012 7:03 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 3, 2014 1:55 PM
Points: 199, Visits: 735
The reason for the loop join is not the table variable. It is the subselect for the XML-output. It doesn't matter which kind of tables you take, when using the XML-output like shown in the first example there will always be a loop join like with nearly any other subselect I know.
The table variable in my case is only part of the where-condition of the outer select.
Post #1391923
Posted Monday, December 3, 2012 7:25 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, October 3, 2014 1:55 PM
Points: 199, Visits: 735
Here is the same example using large, permanent tables.
create table dbo.Parcel (
ParcelId int,
Shipper varchar(50),
Consignee varchar(50)
);
create clustered index cix_dboParcel_ParcelId on dbo.Parcel ( ParcelId asc );

create table dbo.ParcelDetail(
ParcelId int,
ScannedAtStation varchar(50),
ScannedAtTime datetime
);
create clustered index cix_dboParcelDetail_ParcelId on dbo.ParcelDetail ( ParcelId asc );

declare @counter int = 0;
set nocount on;
while @counter < 50000
begin
insert into dbo.Parcel values
(1*@counter, 'Romeo', 'Julia' ),
(2*@counter, 'Donald', 'Daisy' ),
(3*@counter, 'someone', 'nobody' )

insert into dbo.ParcelDetail values
( 1*@counter, 'Austria', '20120101' ),
( 1*@counter, 'Germany', '20120103' ),
( 1*@counter, 'UK', '20120110' ),
( 2*@counter, 'Spain', '20120101' ),
( 2*@counter, 'Portugal', '20120201' ),
( 3*@counter, 'Italy', '20120301' ),
( 3*@counter, 'Switzerland', '20120310' )

set @counter += 1;
end



select
p.ParcelId,
p.Shipper,
p.Consignee,
TransportRoute =
(
select *
from dbo.ParcelDetail detail
where detail.ParcelId = p.ParcelId
for xml auto
)
from dbo.Parcel p
where p.ParcelId between 10 and 20

select
p.ParcelId,
p.Shipper,
p.Consignee,
TransportRoute =
(
select *
from dbo.ParcelDetail detail
where detail.ParcelId = p.ParcelId
for xml auto
)
from dbo.Parcel p
where p.ParcelId between 1 and 100000


drop table dbo.Parcel;
drop table dbo.ParcelDetail;


When looking at the execution plan of the two selects we see that the index seek is done on both tables, but the subselect still causes the loop join which I want to eliminate.


  Post Attachments 
execution plan.jpg (0 views, 116.50 KB)
Post #1391938
Posted Monday, December 3, 2012 9:39 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
With that same structure, try this:

SELECT *
FROM dbo.Parcel AS P
INNER JOIN dbo.ParcelDetail AS PD
ON P.ParcelId = PD.ParcelId
WHERE P.ParcelId BETWEEN 10 AND 20;

You'll still get a Nested Loops Join.

Has nothing to do with it being a sub-select. Has to do with data distributions.

The second table, in this sample, has between 2 and 7 rows per ParcelID value. The most efficient way to handle that, in this case, is a Nested Loops Join.

You might be able to force this into a Merge Join if the indexes are right and the data volume is high enough, but odds are SQL Server would pick a Merge Join if it would actually be more efficient.

Your test script also produces multiple identical ParcelID values in the Parcel table. Does that parallel your actual environment, or is ParcelID unique in the table that is meant to parallel? If it isn't unique, that will also complicate the join. When I deduped that table and set the clustered index to Unique, the execution plan remains the same, but the execution time on a 10k-row query went from 452 ms CPU time and 1.217 seconds total time, down to 200 ms CPU and 700 ms total. Almost twice as fast. Of course, it returns less rows (10k vs 18k) once the primary table is deduped, which accounts for some of the speed-up, but some of it is also the stats on the table are treated differently with a unique index than with a non-unique one.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1392036
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse