Tuning a query that takes 60 minutes to run,

  • rajsin7786 (8/4/2014)


    sure, just finished running the exec sp_updatestat, all stats are updated on this DB now. Running the SQL now, will keep you posted.

    No, not sp_updatestats (which does a sampled stats update of every single table), an UPDATE STATISTICS, with the FULLSCAN option on the sales fact table (and probably worth doing on all other tables in the query too, to fix any damage that sp_updatestats could have done)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • -- The plan doesn't seem to work at all well with the actual rowcounts, probably due to out-of-date stats as others have suggested.

    -- You can probably eliminate the 500million row range scan by including the columns it's picking up in an adjusted version of index idx_sales_prod_2

    -- and also eliminate the key lookup along the way - which is almost certainly a lot more expensive than it looks.

    -- You MAY get a performance lift from making the date arithmetic a little more obvious too.

    CREATE INDEX idx_sales_prod_2 ON sales_fact (product_id, date_id) INCLUDE (trans_ref, LoyaltyNo, qty, totalPriceExVat, company_margin, wastageval, multibuy_id)

    SELECT

    sales_fact.trans_ref,

    product_dim.pd_key,

    product_dim.pd_desc,

    multibuy_dim.mb_deal_number,

    sales_fact.LoyaltyNo,

    sum(sales_fact.qty),

    sum(sales_fact.totalPriceExVat),

    ( sum(sales_fact.company_margin) ) - isnull((( sum(sales_fact.wastageval) )),0)

    FROM multibuy_dim

    INNER JOIN sales_fact ON (sales_fact.multibuy_id=multibuy_dim.mb_id)

    INNER JOIN product_dim ON (sales_fact.product_id=product_dim.pd_id)

    CROSS APPLY (

    SELECT

    StartDate = DATEADD(WEEK, -3, MIN(date)),

    EndDate = MAX(date)

    FROM date_dim

    WHERE last_week_prior_flag = 'Y'

    ) x

    INNER JOIN date_dim ON date_dim.date_id=sales_fact.date_id

    AND date_dim.date BETWEEN x.StartDate AND x.EndDate

    WHERE

    (

    product_dim.pd_key IN ('481005', '850972', '860091', '860101', '860110', '860130', '860159', '860161', '860211', '860224', '860225', '860230', '860265', '860319', '860320', '860344', '860360',

    '860407', '860414', '860415', '860418', '860469', '860478', '871005', '874115', '874206', '880100', '881153', '890061', '890299', '890360', '890648', '890650', '890651', '891903')

    AND

    (

    multibuy_dim.mb_deal_number = '0000'

    OR

    multibuy_dim.mb_deal_number Is Null

    )

    )

    GROUP BY

    sales_fact.trans_ref,

    product_dim.pd_key,

    product_dim.pd_desc,

    multibuy_dim.mb_deal_number,

    sales_fact.LoyaltyNo

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi guys,

    thanks for all your help, i have updated statistics on all the columns, I ran the query suggested by ChrisM and also the original query , both now run in 25 minutes. There is definitely a performance improvement. i have attached the actual execution plan

    Hi chrism,

    could you please tell me how to go about the below suggestions,

    -- You can probably eliminate the 500million row range scan by including the columns it's picking up in an adjusted version of index idx_sales_prod_2

    -- and also eliminate the key lookup along the way - which is almost certainly a lot more expensive than it looks.

  • rajsin7786 (8/11/2014)


    Hi guys,

    thanks for all your help, i have updated statistics on all the columns, I ran the query suggested by ChrisM and also the original query , both now run in 25 minutes. There is definitely a performance improvement. i have attached the actual execution plan

    Hi chrism,

    could you please tell me how to go about the below suggestions,

    -- You can probably eliminate the 500million row range scan by including the columns it's picking up in an adjusted version of index idx_sales_prod_2

    -- and also eliminate the key lookup along the way - which is almost certainly a lot more expensive than it looks.

    The definition of the index I suggest you create is in my last post. The huge difference between actual and estimated rows suggests that statistics were not updated. Note also that SQL Server timed out before picking a good plan, and it does look horrible since it reads the sales table three times - including 500,000,000 index seeks. Try creating the index, updating statistics for all tables involved except sales (the new index will do) and running the two queries again.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 4 posts - 31 through 33 (of 33 total)

You must be logged in to reply to this topic. Login to reply