slow index scan - unable to improve performance

Question

slow index scan - unable to improve performance

rathimittha.mb

SSChasing Mays

Points: 639
More actions
July 18, 2016 at 2:21 am

#309209

Hi,
I have the following query
select dp.FarmerCropId,
am.ActivityId,
am.ActivityName,
cfm.customformid,
getutcdate() talendate,
aadt.AttributeDataTypeId,
aadt.DataTypeId,
aadt.SequenaceNo,
aadt.groupid,
datatypename,
AttributeId,
AttributeName,
fcaa.reading
from ActivityMaster am
join (select activityid,
isactive,
AttributID,
AttributeDataTypeId,
DataTypeId,
SequenaceNo,
groupid from dbo.activityattributedatatype) aadt
on aadt.activityid = am.activityid
join CustomFormMapping cfm
on cfm.activityid=am.activityid
AND 1=(Case when 1 = 0 then 1 When 1 = 1 and cfm.isactive = 1 then 1 END )
join (select FarmerCropId,varietyid from etl.plot ) dp
on cfm.CropTypeId = dp.VarietyId
left join (select FarmerCropId,
ActivityID,
FarmerCropActivityID,
CompletionDate,
LastModifiedDate,isactive from pactivity) fca
on dp.farmercropid = fca.farmercropid
and fca.activityid = am.activityid
AND 1=(Case when 1 = 0 then 1 When 1 = 1 and fca.isactive = 1 then 1 END )
join vw_datatype dt
on aadt.datatypeid = dt.datatypeid
join activityattributesmaster aam
on aadt.AttributID = aam.AttributeID
AND 1=(Case when 1 = 0 then 1 When 1 = 1 and aam.isactive = 1 then 1 END )
left join (select attributedatatypeid,farmercropactivityid from pactivityattributes) fcaa
on
fcaa.attributedatatypeid = aadt.attributedatatypeid
and fcaa.farmercropactivityid = fca.farmercropactivityid
where aadt.isactive = 1 and
am.iscustomform = 1
The last left join with farmercropactivityattributes table iss the problem. There is a composite non clustered index on attributedatatypeid,farmercropactivityid of the farmercropactivityattributes table. On viewing the execution plan it shows there is 66% cost on the same.
The table has a clustered index as well.
There is a decent performance until joined with farmercropactivityattributes table. But, once I join this table, the performance goes down. I cant use a covering index because the Reading column in farmercropactivityattributes table is a nvarchar(max) column.
the query takes 3 minutes to run.
I am testing this solo, no other users are accessing these tables.
Any help in improvising this would be of super help
thanks
Rathi

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

Table definitions, index definitions and execution plan please.

What exactly is AND 1=(Case when 1 = 0 then 1 When 1 = 1 and fca.isactive = 1 then 1 END ) supposed to do? It can be simplified as just AND fca.isactive = 1, which is likely to be a lot easier for the optimiser to handle.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 2

Rathi

It may not help performance, but it would simplify your query if you removed those subqueries and joined directly to the tables, and if you get rid of the CASE expressions and change the whole line to [font="Courier New"]AND xxx.isactive = 1[/font].

If performance is indeed the same after you've done that, please post the actual execution plan and the definition of any views that are used in the query.

John

rathimittha.mb SSChasing Mays Points: 639 More actions · Answer 3

Hi,

The query is

select dp.FarmerCropId,

am.ActivityId,

am.ActivityName,

cfm.customformid,

getutcdate() talendate,

aadt.AttributeDataTypeId,

aadt.DataTypeId,

aadt.SequenaceNo,

aadt.groupid,

datatypename,

AttributeId,

AttributeName,

fcaa.reading

from dbo.ActivityMaster am

join (select activityid,

isactive,

AttributID,

AttributeDataTypeId,

DataTypeId,

SequenaceNo,

groupid from dbo.activityattributedatatype) aadt

on aadt.activityid = am.activityid

join dbo.CustomFormMapping cfm

on cfm.activityid=am.activityid

AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and cfm.isactive = 1 then 1 END )

join (select FarmerCropId,varietyid from etl.plot ) dp

on cfm.CropTypeId = dp.VarietyId

left join (select FarmerCropId,

ActivityID,

FarmerCropActivityID,

CompletionDate,

LastModifiedDate,isactive from dbo.farmercropactivity) fca

on dp.farmercropid = fca.farmercropid

and fca.activityid = am.activityid

AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and fca.isactive = 1 then 1 END )

join dbo.vw_datatype dt

on aadt.datatypeid = dt.datatypeid

join dbo.activityattributesmaster aam

on aadt.AttributID = aam.AttributeID

AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and aam.isactive = 1 then 1 END )

left join (select attributedatatypeid,farmercropactivityid,reading from dbo.farmercropactivityattributes) fcaa

on

fcaa.attributedatatypeid = aadt.attributedatatypeid

and fcaa.farmercropactivityid = fca.farmercropactivityid

where aadt.isactive = 1 and

am.iscustomform = 1

Sorry the case expressions are for this purpose

case when @parm_active = 1 and isactive = 1 then 1 when @parm_active = 0 and isactive = 0 then 1 end

So when @parm_active = 1 it picks data that has isactive = 1 and when 0 picks isactive = 0 rows.

The business requirement is to fetch isactive = 0 or isactive = 1 rows. So, I used the case statement to accomodate both of them

The subqueries like 'select attributedatatypeid,farmercropactivityid from pactivityattributes',

'select FarmerCropId,varietyid from etl.plot ) dp

on cfm.CropTypeId = dp.VarietyId

left join (select FarmerCropId,

ActivityID,

FarmerCropActivityID,

CompletionDate,

LastModifiedDate,isactive from dbo.farmercropactivity'

are used, so that the index that contains attributedatatypeid,farmercropactivityid is used and table need not be accessed

The index definitions are as follows

create nonclustered INDEX IDX_ActivityMasteriscustom on ActivityMaster(iscustomform);

primary key on activitymaster(activityid)

create nonclustered INDEX IDX_plotvariety on etl.plot(VarietyId);

create index idx_varietyfarmercropid on etl.plot(varietyid,FarmerCropId)

create clustered index idx_farmercropid on etl.plot(FarmerCropId)

create nonclustered index idx_comp activityattributedatatype(ActivityID, IsActive, AttributID, AttributeDataTypeID, DataTypeID, SequenaceNo, GroupID)

primary key on activityattributedatatype(AttributeDataTypeID)

create nonclustered index idx_activityid on activityattributedatatype(ActivityID)

create nonclustered index idx_fcaa_attrid on farmercropactivityattributes(attributedatatypeid)

create nonclustered index idx_fcaa_activityid on farmercropactivityattributes(farmercropactivityid)

create nonclustered index idx_comp_attrreading on farmercropactivityattributes(AttributeDataTypeID, FarmerCropActivityID)

primary key farmercropactivityattributes(FarmerCropActivityAttributeID)

create clustered INDEX IDX_etlplotuser on etl.plot_user(farmercropid);

The view vw_datatype is a simple select on a table that contains only 4 rows---

DataTypeID

DataTypeName

DataTypeDesc

LastModifiedDate

IsActive

The actual execution plan is attached on this reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

It does help if you post the actual query...

Add OPTION(RECOMPILE) to the end if the query and test again. The case statements as they are ensure that the optimiser cannot generate a good plan for all cases, so it will have to generate a sub-optimal plan

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 5

Would [font="Courier New"]AND xxx.isactive = @parm_active[/font] not do it?

Are you saying that if you put the table names in instead of the subqueries, you get a different, less efficient execution plan?

You've posted in the SQL Server 2005 forum. Is that the actual version you're using? If not, which version, and what compatibility level is your database?

John

rathimittha.mb SSChasing Mays Points: 639 More actions · Answer 6

Ahhh.. apologies, posted in a hurry and didn't see the forum version. I am using sql server 2012

I have posted the query in previous reply.

The removal of case statement and just putting in isactive = 1 in the where clause gives the result in less than 45 seconds.

But, if I have to accomodate both @parm_active = 1 and @parm_active = 0,how do I accomplish this??

My intention is to use a single query for both the cases. And that's why the usage of 'case'

The OPTION (RECOMPILE) is very good. It is indeed making my query run very fast

Also, I am creating the etl.plot table ,populating and creating indexes on it just before I run this query. So, would it help if I ran UPDATE STATISTICS etl.plot table???

PFA the new plan (plan without case statement) for reference. This runs in 45 seconds and the previous with case ran for 3 minutes

Thanks

Rathi

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

rathimittha.mb (7/18/2016)
My intention is to use a single query for both the cases. And that's why the usage of 'case'

'Generic' queries such as that are typically slow. I think at this point your choices are to split it into two queries or use the recompile hint (adds compilation overhead on each execution)

The problem with generic queries is that usually the optimiser has to generate sub-optimal query plans to accommodate the multiple alternative ways the query will run.

Also, I am creating the etl.plot table ,populating and creating indexes on it just before I run this query. So, would it help if I ran UPDATE STATISTICS etl.plot table???

Not in the slightest.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

rathimittha.mb SSChasing Mays Points: 639 More actions · Answer 8

The OPTION RECOMPILE with CASE still generates a sub optimal plan. So I guess I need to consider writing into 2 separate queries

Also, under what conditions would I use UPDATE STATISTICS ?? I assumed it was right to use in the current situation of mine

Also,

lets say I have a query, select c1,c2,c3 from table. And I have a index on c1,c2,c3.

So when I use this in the join , should I join as

tab join (select c1,c2,c3 from table) on .....

OR

tab join table on....

My understanding is that the former ensures usage of index and picks only the columns that is required hence reducing the work of the optimizer

Thanks

Rathi

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

rathimittha.mb (7/18/2016)
Also, under what conditions would I use UPDATE STATISTICS ?? I assumed it was right to use in the current situation of mine

When the statistics are out of date due to data changes, which definitely won't be the case when the indexes were created (and hence the index statistics were created) in the previous step

lets say I have a query, select c1,c2,c3 from table. And I have a index on c1,c2,c3.
So when I use this in the join , should I join as
tab join (select c1,c2,c3 from table) on .....
OR
tab join table on....

The latter.

The first is just unnecessary complexity, more typing, harder to read, no advantage

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

rathimittha.mb SSChasing Mays Points: 639 More actions · Answer 10

Okay.

But, many a places on the internet suggest that only the required columns from a table must be retrieved and not all of them. If the select requires only 5 columns and there are 50 columns in the table. Wouldn't it be better to just select the required 5 and then join with any other table??

Thanks

Rathi

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

rathimittha.mb (7/18/2016)
Wouldn't it be better to just select the required 5 and then join with any other table??

No. It's better to write your query as simply as you can.

SELECT t1.col1, t1.col2, t2.col1, t2.col2, t2.col3, t2.col4, t2.col5

FROM t1 inner join t2 on ...

or

SELECT t1.col1, t1.col2, t2.col1, t2.col2, t2.col3, t2.col4, t2.col5

FROM (SELECT col1, col2 from t1) t1

INNER JOIN (SELECT col1, col2, col3, col4, col5 FROM t2) t2 ON ...

They do exactly the same thing, in exactly the same way, and both only retrieve the required columns from the tables. Which is easier to read and understand?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

rathimittha.mb SSChasing Mays Points: 639 More actions · Answer 12

Okay

But some places on the internet suggest that only the required columns must be taken and then joined to other tables. Say I require 5 columns from a 50 column table, wouldn't just selecting those 5 columns be more efficient??

Thanks

Rathi

rathimittha.mb SSChasing Mays Points: 639 More actions · Answer 13

Sorry about the same question being posted twice, the page did not refresh and I accidentally posted again.

I now get it.

Thanks a lot for the great help GilaMonster 🙂

Regards

Rathi