Query Performance

  • Hey guys

    Just looking if anyone else can advise on an improvements (if any can be made to a query I'm running)

    Background: The original query (which took between 20- 25 seconds) was using a scalar function in the main query that was slowing things down. I created a temp table as a lookup which is populated with the return values for each record that the function would have returned. I now join this temp table to the main select.

    Although this has improved performance, now down to 7-10 seconds to complete the whole query, I still think it can be improved.

    It's in the initial population of this lookup table where the main slowness occurs.

    One of the selects I'm using to populate the lookup table was using an inner join to two tables. The latter inner join was causing a huge difference in the Actual vs Estimated number of rows (estimate was 3k, actual was nearly 700k). The query didn't need any fields from this join in it's return values so I replaced this inner join with an EXISTS in the WHERE criteria.

    This has improved performance but now the exec plan says 8k for Actual and 1 row for Estimated. I know this is mainly down to the use of the <= in the where criteria of the subquery but I'm wondering if anyone else has any other ideas on how to get the estimate closer to the actual

    Query:

    SELECT a.col1

    , a.col2

    , a.col3

    , a.col4

    , MAX(d.col6) AS col5

    FROM #Temp1 a

    INNER JOIN table2 d WITH (NOLOCK)

    ONa.col2 = d.col2

    ANDa.col4= d.col4

    WHEREa.col3 = -1

    ANDa.col5 IS NULL

    ANDd.col3 IN ('A', 'P')

    ANDd.col6 < @now

    AND EXISTS

    (

    SELECT 1 FROM table3 p

    WHERE p.col1 = d.col3

    AND p.col2 <= d.col2 --this is where I think the issue is

    AND p.col3 = a.col1

    )

    GROUP BY a.col1, a.col2, a.col3, a.col4

  • Perhaps the statistics on your tables are out of date. Try updating them with FULLSCAN.

    John

  • Yeah tried that along with rebuilding and reorganzing the indexes.

    The index used does cover the query and performs a seek not a scan

  • Can you post the actual execution plan for the query?

    😎

  • Exec plan attached

    I'm wondering if this might be combination of using the <= operator in the inner join along with using a temp table which might be throwing the stats a bit.

    this query is the last of 4 update statements that runs against the temp table

    If I run the query without this particular update it takes 2-3 seconds

  • mitzyturbo (10/13/2016)


    Exec plan attached

    I'm wondering if this might be combination of using the <= operator in the inner join along with using a temp table which might be throwing the stats a bit.

    this query is the last of 4 update statements that runs against the temp table

    If I run the query without this particular update it takes 2-3 seconds

    First things to fix

    😎

    <Warnings>

    <PlanAffectingConvert ConvertIssue="Cardinality Estimate" Expression="CONVERT_IMPLICIT(date,[d].[NAVdate],0)" />

    <PlanAffectingConvert ConvertIssue="Seek Plan" Expression="[a].[NAVDate]=CONVERT_IMPLICIT(date,[d].[NAVdate],0)" />

    <PlanAffectingConvert ConvertIssue="Seek Plan" Expression="CONVERT_IMPLICIT(date,[d].[NAVdate],0)=[t].[NAVDate]" />

    </Warnings>

  • So the original DB owners, in their infinite wisdom, are using varchar(10)'s to store DATE values

    For testing purposes, I just changed the date field on my temp table to a varchar(10) as well, the warnings have gone but the performance is still the same. The actual and estimate no. of rows for the seek is the same.

    However, the cost relative to the batch for the index seek has now reduced to 40%.

    I spotted a Hash Match, with a cost of 10% relative to the batch and it's actual vs estimated no. of rows is way off: Actual 412497, Estimate 2034.45

  • -- Your query matches portfolio.clientID to deadline.clientID

    -- where does #Temp1.clientID fit into this?

    -- It might be possible to do something like this

    UPDATE a

    SET Ret = x.Ret

    FROM #Temp1 a

    CROSS APPLY (

    SELECT MAX(deadline) AS Ret

    FROM deadline d

    WHERE a.NAVDate = d.NavDate

    AND a.qid = d.qid

    AND d.statusCode IN ('A', 'P')

    AND d.deadline < @now

    AND EXISTS

    (

    SELECT 1

    FROM portfolio p

    WHERE p.clientID = d.clientID

    AND p.portfolioDate <= a.navDate

    AND p.investmentID = a.InvestmentID

    )

    ) x

    WHERE a.ClientID = -1

    AND a.Ret IS NULL

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • The Temp table is a lookup table for each record that I'm applying the update for instead of using a function call (which returns the Ret value) within the main select. In the main select I join back to the temp table to retrieve this value.

    The cross apply won't make any difference as it's using the same inner join that's causing the slow performance.

  • You've got two sort operations (including one with a spill to tempdb) in support of a merge join. Despite the estimated costs, I think this is where more of the slow behavior of the query comes from. The optimizer is timing out on this query. I suspect there just isn't enough to go on for it to arrive at a good plan. You might try putting an index on the temp table after you build it. Preferable would be a unique index if the data supports it. NavDate & QID seem like good candidates. Also, just to help the optimizer do it's work while you're tuning this, lose the NOLOCK hint.

    Since you're only performing a MAX on the aggregations, you might try a TOP 1 with an ORDER BY instead. That's a much better approach for most queries than aggregations.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • That sounds about right to me Grant, the join on this subquery are on some very non specific values - NAVDate and QID. This sub query is basically a catch all for the records on the table that haven't already been updated and don't have a clientid. The other subqueries run mostly the same syntax but use the clientid in the join which speeds things up.

    At this stage I've spent over a day reducing a query from 20-25 seconds to 10, not perfect, but I think it will do.

    On a separate note, pardon my ignorance here and maybe I've jsut been looking at the screen too long :w00t: but how would you join on a subquery with a Top 1 for each record? The other subqueries use MIN's so I would just swap the order by around to get that, but won't TOP 1 always return only 1 record? Are you using ROW_NUMBER and filtering where the row_number = 1?

  • Big assumption on my part, but aren't you going for something like the situation in this article[/url]? There are examples there of exactly what I mean.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I just looked at your very first query in that article and saw the first inner join and shouted a four letter word in my head πŸ™‚ Sometimes you can just drink too much coffee

    On a much more successful note, I finally copped that the WHERE clause on the subquery was using the clientid on the wrong side (on the #Temp table) A simple change of the a. to a d. removed that heavy load on the Hash Match. Query is running in under 2 seconds now

    I had assumed in the original code in the function which set NULL values for clientid's to -1 wasn't their actual value on the table but it is!!

    Thanks for your help on this one

  • -- Most of the cost of the query is reading the

    -- portfolio table - with the aggregate performed AFTER this expensive bunch of seeks.

    -- This version of the query might well aggregate BEFORE the join to portfolio.

    -- If it doesn't, then I'd explore other means of achieving this.

    -- Note that reading the portfolio table is estimated at 40%,

    -- is estimated to be 2034 executions (seeks) but is actually 412419 executions.

    -- That's the number of rows coming off the product of #temp and deadline, hence the urge to

    -- aggregate first.

    UPDATE a

    SET Ret = x.Ret

    FROM #Temp1 a

    CROSS APPLY (

    SELECT TOP (1)

    d.deadline AS Ret,

    d.clientID

    FROM deadline d

    WHERE a.NAVDate = d.NavDate

    AND a.qid = d.qid

    AND d.statusCode IN ('A', 'P')

    AND d.deadline < @now

    ORDER BY deadline DESC

    ) x

    WHERE a.ClientID = -1

    AND a.Ret IS NULL

    AND EXISTS

    (

    SELECT 1

    FROM portfolio p

    WHERE p.clientID = d.clientID

    AND p.portfolioDate <= d.navDate

    AND p.investmentID = a.InvestmentID

    )

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks for that Chris, CROSS APPLY is a lot neater in this case alright

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply