slow index scan - unable to improve performance

  • Hi,

    I have the following query

    select dp.FarmerCropId,

    am.ActivityId,

    am.ActivityName,

    cfm.customformid,

    getutcdate() talendate,

    aadt.AttributeDataTypeId,

    aadt.DataTypeId,

    aadt.SequenaceNo,

    aadt.groupid,

    datatypename,

    AttributeId,

    AttributeName,

    fcaa.reading

    from ActivityMaster am

    join (select activityid,

    isactive,

    AttributID,

    AttributeDataTypeId,

    DataTypeId,

    SequenaceNo,

    groupid from dbo.activityattributedatatype) aadt

    on aadt.activityid = am.activityid

    join CustomFormMapping cfm

    on cfm.activityid=am.activityid

    AND 1=(Case when 1 = 0 then 1 When 1 = 1 and cfm.isactive = 1 then 1 END )

    join (select FarmerCropId,varietyid from etl.plot ) dp

    on cfm.CropTypeId = dp.VarietyId

    left join (select FarmerCropId,

    ActivityID,

    FarmerCropActivityID,

    CompletionDate,

    LastModifiedDate,isactive from pactivity) fca

    on dp.farmercropid = fca.farmercropid

    and fca.activityid = am.activityid

    AND 1=(Case when 1 = 0 then 1 When 1 = 1 and fca.isactive = 1 then 1 END )

    join vw_datatype dt

    on aadt.datatypeid = dt.datatypeid

    join activityattributesmaster aam

    on aadt.AttributID = aam.AttributeID

    AND 1=(Case when 1 = 0 then 1 When 1 = 1 and aam.isactive = 1 then 1 END )

    left join (select attributedatatypeid,farmercropactivityid from pactivityattributes) fcaa

    on

    fcaa.attributedatatypeid = aadt.attributedatatypeid

    and fcaa.farmercropactivityid = fca.farmercropactivityid

    where aadt.isactive = 1 and

    am.iscustomform = 1

    The last left join with farmercropactivityattributes table iss the problem. There is a composite non clustered index on attributedatatypeid,farmercropactivityid of the farmercropactivityattributes table. On viewing the execution plan it shows there is 66% cost on the same.

    The table has a clustered index as well.

    There is a decent performance until joined with farmercropactivityattributes table. But, once I join this table, the performance goes down. I cant use a covering index because the Reading column in farmercropactivityattributes table is a nvarchar(max) column.

    the query takes 3 minutes to run.

    I am testing this solo, no other users are accessing these tables.

    Any help in improvising this would be of super help

    thanks

    Rathi

  • Table definitions, index definitions and execution plan please.

    What exactly is AND 1=(Case when 1 = 0 then 1 When 1 = 1 and fca.isactive = 1 then 1 END ) supposed to do? It can be simplified as just AND fca.isactive = 1, which is likely to be a lot easier for the optimiser to handle.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Rathi

    It may not help performance, but it would simplify your query if you removed those subqueries and joined directly to the tables, and if you get rid of the CASE expressions and change the whole line to [font="Courier New"]AND xxx.isactive = 1[/font].

    If performance is indeed the same after you've done that, please post the actual execution plan and the definition of any views that are used in the query.

    John

  • Hi,

    The query is

    select dp.FarmerCropId,

    am.ActivityId,

    am.ActivityName,

    cfm.customformid,

    getutcdate() talendate,

    aadt.AttributeDataTypeId,

    aadt.DataTypeId,

    aadt.SequenaceNo,

    aadt.groupid,

    datatypename,

    AttributeId,

    AttributeName,

    fcaa.reading

    from dbo.ActivityMaster am

    join (select activityid,

    isactive,

    AttributID,

    AttributeDataTypeId,

    DataTypeId,

    SequenaceNo,

    groupid from dbo.activityattributedatatype) aadt

    on aadt.activityid = am.activityid

    join dbo.CustomFormMapping cfm

    on cfm.activityid=am.activityid

    AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and cfm.isactive = 1 then 1 END )

    join (select FarmerCropId,varietyid from etl.plot ) dp

    on cfm.CropTypeId = dp.VarietyId

    left join (select FarmerCropId,

    ActivityID,

    FarmerCropActivityID,

    CompletionDate,

    LastModifiedDate,isactive from dbo.farmercropactivity) fca

    on dp.farmercropid = fca.farmercropid

    and fca.activityid = am.activityid

    AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and fca.isactive = 1 then 1 END )

    join dbo.vw_datatype dt

    on aadt.datatypeid = dt.datatypeid

    join dbo.activityattributesmaster aam

    on aadt.AttributID = aam.AttributeID

    AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and aam.isactive = 1 then 1 END )

    left join (select attributedatatypeid,farmercropactivityid,reading from dbo.farmercropactivityattributes) fcaa

    on

    fcaa.attributedatatypeid = aadt.attributedatatypeid

    and fcaa.farmercropactivityid = fca.farmercropactivityid

    where aadt.isactive = 1 and

    am.iscustomform = 1

    Sorry the case expressions are for this purpose

    case when @parm_active = 1 and isactive = 1 then 1 when @parm_active = 0 and isactive = 0 then 1 end

    So when @parm_active = 1 it picks data that has isactive = 1 and when 0 picks isactive = 0 rows.

    The business requirement is to fetch isactive = 0 or isactive = 1 rows. So, I used the case statement to accomodate both of them

    The subqueries like 'select attributedatatypeid,farmercropactivityid from pactivityattributes',

    'select FarmerCropId,varietyid from etl.plot ) dp

    on cfm.CropTypeId = dp.VarietyId

    left join (select FarmerCropId,

    ActivityID,

    FarmerCropActivityID,

    CompletionDate,

    LastModifiedDate,isactive from dbo.farmercropactivity'

    are used, so that the index that contains attributedatatypeid,farmercropactivityid is used and table need not be accessed

    The index definitions are as follows

    create nonclustered INDEX IDX_ActivityMasteriscustom on ActivityMaster(iscustomform);

    primary key on activitymaster(activityid)

    create nonclustered INDEX IDX_plotvariety on etl.plot(VarietyId);

    create index idx_varietyfarmercropid on etl.plot(varietyid,FarmerCropId)

    create clustered index idx_farmercropid on etl.plot(FarmerCropId)

    create nonclustered index idx_comp activityattributedatatype(ActivityID, IsActive, AttributID, AttributeDataTypeID, DataTypeID, SequenaceNo, GroupID)

    primary key on activityattributedatatype(AttributeDataTypeID)

    create nonclustered index idx_activityid on activityattributedatatype(ActivityID)

    create nonclustered index idx_fcaa_attrid on farmercropactivityattributes(attributedatatypeid)

    create nonclustered index idx_fcaa_activityid on farmercropactivityattributes(farmercropactivityid)

    create nonclustered index idx_comp_attrreading on farmercropactivityattributes(AttributeDataTypeID, FarmerCropActivityID)

    primary key farmercropactivityattributes(FarmerCropActivityAttributeID)

    create clustered INDEX IDX_etlplotuser on etl.plot_user(farmercropid);

    The view vw_datatype is a simple select on a table that contains only 4 rows---

    DataTypeID

    DataTypeName

    DataTypeDesc

    LastModifiedDate

    IsActive

    The actual execution plan is attached on this reply

  • It does help if you post the actual query...

    Add OPTION(RECOMPILE) to the end if the query and test again. The case statements as they are ensure that the optimiser cannot generate a good plan for all cases, so it will have to generate a sub-optimal plan

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Would [font="Courier New"]AND xxx.isactive = @parm_active[/font] not do it?

    Are you saying that if you put the table names in instead of the subqueries, you get a different, less efficient execution plan?

    You've posted in the SQL Server 2005 forum. Is that the actual version you're using? If not, which version, and what compatibility level is your database?

    John

  • Ahhh.. apologies, posted in a hurry and didn't see the forum version. I am using sql server 2012

    I have posted the query in previous reply.

    The removal of case statement and just putting in isactive = 1 in the where clause gives the result in less than 45 seconds.

    But, if I have to accomodate both @parm_active = 1 and @parm_active = 0,how do I accomplish this??

    My intention is to use a single query for both the cases. And that's why the usage of 'case'

    The OPTION (RECOMPILE) is very good. It is indeed making my query run very fast

    Also, I am creating the etl.plot table ,populating and creating indexes on it just before I run this query. So, would it help if I ran UPDATE STATISTICS etl.plot table???

    PFA the new plan (plan without case statement) for reference. This runs in 45 seconds and the previous with case ran for 3 minutes

    Thanks

    Rathi

  • rathimittha.mb (7/18/2016)


    My intention is to use a single query for both the cases. And that's why the usage of 'case'

    'Generic' queries such as that are typically slow. I think at this point your choices are to split it into two queries or use the recompile hint (adds compilation overhead on each execution)

    The problem with generic queries is that usually the optimiser has to generate sub-optimal query plans to accommodate the multiple alternative ways the query will run.

    Also, I am creating the etl.plot table ,populating and creating indexes on it just before I run this query. So, would it help if I ran UPDATE STATISTICS etl.plot table???

    Not in the slightest.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The OPTION RECOMPILE with CASE still generates a sub optimal plan. So I guess I need to consider writing into 2 separate queries

    Also, under what conditions would I use UPDATE STATISTICS ?? I assumed it was right to use in the current situation of mine

    Also,

    lets say I have a query, select c1,c2,c3 from table. And I have a index on c1,c2,c3.

    So when I use this in the join , should I join as

    tab join (select c1,c2,c3 from table) on .....

    OR

    tab join table on....

    My understanding is that the former ensures usage of index and picks only the columns that is required hence reducing the work of the optimizer

    Thanks

    Rathi

  • rathimittha.mb (7/18/2016)


    Also, under what conditions would I use UPDATE STATISTICS ?? I assumed it was right to use in the current situation of mine

    When the statistics are out of date due to data changes, which definitely won't be the case when the indexes were created (and hence the index statistics were created) in the previous step

    lets say I have a query, select c1,c2,c3 from table. And I have a index on c1,c2,c3.

    So when I use this in the join , should I join as

    tab join (select c1,c2,c3 from table) on .....

    OR

    tab join table on....

    The latter.

    The first is just unnecessary complexity, more typing, harder to read, no advantage

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Okay.

    But, many a places on the internet suggest that only the required columns from a table must be retrieved and not all of them. If the select requires only 5 columns and there are 50 columns in the table. Wouldn't it be better to just select the required 5 and then join with any other table??

    Thanks

    Rathi

  • rathimittha.mb (7/18/2016)


    Wouldn't it be better to just select the required 5 and then join with any other table??

    No. It's better to write your query as simply as you can.

    SELECT t1.col1, t1.col2, t2.col1, t2.col2, t2.col3, t2.col4, t2.col5

    FROM t1 inner join t2 on ...

    or

    SELECT t1.col1, t1.col2, t2.col1, t2.col2, t2.col3, t2.col4, t2.col5

    FROM (SELECT col1, col2 from t1) t1

    INNER JOIN (SELECT col1, col2, col3, col4, col5 FROM t2) t2 ON ...

    They do exactly the same thing, in exactly the same way, and both only retrieve the required columns from the tables. Which is easier to read and understand?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Okay

    But some places on the internet suggest that only the required columns must be taken and then joined to other tables. Say I require 5 columns from a 50 column table, wouldn't just selecting those 5 columns be more efficient??

    Thanks

    Rathi

  • Sorry about the same question being posted twice, the page did not refresh and I accidentally posted again.

    I now get it.

    Thanks a lot for the great help GilaMonster 🙂

    Regards

    Rathi

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply