Improve reporting stored procedure execution time - tuning temporary tables?

  • Thanks.

    Firstly, the estimate of the rowcount coming out of the table join is on the low side: as a result, the sort operator supporting the stream aggregate is twice as expensive as the estimate and subsequently spills because the memory grant is too low. Cluster both temp tables on the join columns: this will create statistics which may well be better than those automatically created. You also give the optimiser a wider choice of join options.

    Next, you have a huge GROUP BY list. Find out which of those columns are actually used as partitioning columns (hint: if you remove them from the GROUP BY, the output row count changes) and use MAX() or MIN() around the rest. If you’re lucky, you will reduce the partitioning columns down to the same column pair as the join, which will then support a fast streaming aggregate without the expensive sort.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I have tested adding the clustered index to the temporary tables and the overall time of running the stored procedure increased (unfortunately).

    To be fair, the amount of time added is small, about 5-10% time added, but it's there. I'm still trying to find a way to reduce the number of GROUP BY columns, but it's not working very well so far.

    ~ Just some guy trying to tune queries ~

  • radu.gheorghiu (8/8/2016)


    I have tested adding the clustered index to the temporary tables and the overall time of running the stored procedure increased (unfortunately).

    To be fair, the amount of time added is small, about 5-10% time added, but it's there. I'm still trying to find a way to reduce the number of GROUP BY columns, but it's not working very well so far.

    Can you post the script you used for creating the indexes please?

    As far as I can see, the partitions in the output should be something like this:

    ProgramID, StoreID, and whether or not b.budgetMonth between @start_date and @end_date

    There may be others - can you tell us?

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here's the code for the clustered indexes:

    create clustered index cls_ix_stores_for_budgets on #StoresBudgets (storeID, ProgramID);

    create clustered index cls_ix_tmp_report_final on #temp_report_data (store_ID, newSourceID);

    The other columns in the SELECT and GROUP BY are similar to the ones already posted in the query, except there are about 5 times more ISNULL(SUM()) functions.. (I have a feeling this might have been important information which I should have shared earlier).

    The SELECT part of the actual query is more like:

    SELECT

    case when b.StoreDivision IS null and b.ProgramDescription = 'New' then 'Unassigned' else b.StoreDivision end as Division

    , b.ProgramName

    , b.Region

    ,case when b.AM IS null and b.ProgramName IS not null

    then 'Unassigned'

    else b.AM

    end as AM

    , rtrim(ltrim(b.Store)) Store

    , ltrim(trf.Affiliate) Affiliate

    , trd.Store_ID

    , b.appliesToPeriod

    , trd.AffiliateID

    , isnull(trd.countLeadActual,0) as Actual

    , isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget

    , isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget

    , isnull(...)

    , isnull(sum(case when x and y between @start and @end then end))

    , isnull(sum(case when x and y between @start and @end and (trd.considerMe ... ) then end))

    , isnull(...)

    , isnull(sum(case when x and y between @start and @end then end))

    for a total of 10 SUM operations, which indeed follow the same pattern as the two completely filled in SUM operations from just above.

    ~ Just some guy trying to tune queries ~

  • radu.gheorghiu (8/8/2016)


    Here's the code for the clustered indexes:

    create clustered index cls_ix_stores_for_budgets on #StoresBudgets (storeID, ProgramID);

    create clustered index cls_ix_tmp_report_final on #temp_report_data (store_ID, newSourceID);

    The other columns in the SELECT and GROUP BY are similar to the ones already posted in the query, except there are about 5 times more ISNULL(SUM()) functions.. (I have a feeling this might have been important information which I should have shared earlier).

    The SELECT part of the actual query is more like:

    SELECT

    case when b.StoreDivision IS null and b.ProgramDescription = 'New' then 'Unassigned' else b.StoreDivision end as Division

    , b.ProgramName

    , b.Region

    ,case when b.AM IS null and b.ProgramName IS not null

    then 'Unassigned'

    else b.AM

    end as AM

    , rtrim(ltrim(b.Store)) Store

    , ltrim(trf.Affiliate) Affiliate

    , trd.Store_ID

    , b.appliesToPeriod

    , trd.AffiliateID

    , isnull(trd.countLeadActual,0) as Actual

    , isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget

    , isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget

    , isnull(...)

    , isnull(sum(case when x and y between @start and @end then end))

    , isnull(sum(case when x and y between @start and @end and (trd.considerMe ... ) then end))

    , isnull(...)

    , isnull(sum(case when x and y between @start and @end then end))

    for a total of 10 SUM operations, which indeed follow the same pattern as the two filled in above.

    Yes and they're mostly the same too.

    The indexes look ok.

    Are ProgramName and ProgramDescription attributes of ProgramID? If they are, then remove them from the GROUP BY and use MAX() around any reference to them in the SELECT list.

    Do the same with StoreID.

    You've got a small table and a large table. Are you getting counts from the larger table, for each row in the smaller? I'm wondering if you could get away with an OUTER APPLY for flattening the bigger table.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply