Help with Optimizing intermittent long-running query

  • Hello,

    Situation: I've got a situation with one of our apps that approximately 50% of the time a certain query runs very long (5-10 min.). However, when I run the query on my machine it runs in under 30 seconds. When I run the query thru the app on my machine it runs in under 30 seconds. I have optimized the query as best as I can...but the main issue is why it runs long sometimes and not others. I have traced the DB on several occasions and have concrete examples of it running long and examples of it running fast.

    Question: What is the best strategy for debugging this kind of intermittent issue. How do I figure in/measure other factors in my testing (i.e., db load (other queries running at same time), server load, etc.)

    I mainly have experience optimizing queries that always run slow rather than this intermittent kind.

    Any strategical hints or tips would be greatly appreciated.

    George H.

  • I understand that you are running the query with the same parameters, as I remeber having performance problems with different values being passed for the query, and having different execution times. The problem was solved by manually updating statiscitcs for particular table.

    Is the memory utilization on you server high?

    Any chance of posting your query?

    Cheers, Max

    Check-out free open-source utility Sql Server Replication Explorer[/url]
  • George,

    it does sound like there could be execution plan issues, as I guess on your machine you have a fairly clean procedure cache, whilst the 'live' server has a relatively busy proc cache.

    Having said that I don't know what your query is doing.....

    A similar tale here?http://www.sqlservercentral.com/articles/Performance+Tuning/2863/

    Kev

  • kevriley,

    Interesting. So I can test this using the examples from the traces I have? If I inspect the execution plan of the fast running query and compare it to the slow running version, it might tell me something. Then try to figure out what about the slow execution plan is making it slow?

    I'll give that a try and post the results...

    Thanks,

    George H.

  • Max,

    Unfortunately we're a split shop here so I don't have access to the server and OS. I even have to request traces to be run for me. It definitely makes debugging harder and I have a request into the Server guys for some insight.

    The query does take different parameters and values...I'll start in that area (I don't know why I didn't think of that myself)...thanks for the idea!

    I'm not sure if posting the query would really help...I'm looking more for a hi-level plan to follow or debug strategy rather than specific fixes for this one query. Especially since I have other queries with the same issue (teach a man to fish...)

    Thanks again,

    George H.

  • George,

    sometimes a good indication of having a bad plan cached is that the query miraculously works fine after a reboot (or a DBCC FREEPROCCACHE), but I'm guessing from your earlier posts that you have neither the access nor the inclination to do these to a live server!!

    Kev

  • Sounds like it could be parameter sniffing.

    http://sqlinthewild.co.za/index.php/2007/11/27/parameter-sniffing/

    http://sqlinthewild.co.za/index.php/2008/02/25/parameter-sniffing-pt-2/

    http://sqlinthewild.co.za/index.php/2008/05/22/parameter-sniffing-pt-3/

    If you post the query, we could point out potential problems with it, give you an idea what to look for, and why with the other procs.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • OK, I'll post the query...but it's not pretty. I've made a little progress adding one index, but it's still running great and then it runs long. I've been running it at around 1 second this morning, but I just kicked it off again and it took 2 min. 21 seconds...

    select

    tab.TreatmenRegimenNumber,

    tab.RevisionNumber,

    tab.Protocol,

    '8/29/2008' as 'Treatment Date',

    tab.PIName 'PI Name',

    tab.Species,

    tab.ABSL,

    tab.CageCardid 'Cage Card',

    case tab.usdacovered

    when 0 then tab.secondarycagecardID

    when 1 then convert(varchar,tab.AnimalID)

    end AS 'ID',

    'No Action' as 'status',

    tab.Description AS Treatment

    from

    (

    select

    sp.usdacovered,

    a.secondarycagecardID,

    a.AnimalID,

    a.TreatmenRegimenNumber,

    a.RevisionNumber,

    a.ProtocolNumber AS Protocol ,

    C.LastName + ', ' + C.FirstName AS PIName,

    Sp.SpeciesDescription AS Species,

    a.CageCardID ,

    a.CreatedDate,

    a.VivariumCode,

    a.RoomCardID,

    a.TreatmentSheetType,

    a.ClinicalChargeCode,

    a.NumberOfTimes,

    a.Instructions,

    a.NoCharge,

    a.ModifiedBy,

    a.ModifiedDate,

    isnull( A.NextInstanceDate ,T.FutureDate) as 'TreatMentDate' ,

    A.startdate ,

    A.enddate,

    V.VivariumName,

    S.ULARCodeDescription ,

    R.RoomCode,

    CH.DESCRIPTION ,

    a.ProtocolNumber,

    P.ABSLCode 'ABSL',

    a.holidayrule,

    S.ULARCode,

    SpeciesOrderCode

    from TreatmentRegimen A

    Left Join CLINICALCHARGECODES CH

    on A.CLINICALCHARGECODE=CH.CLINICALCHARGECODE

    left join TreatmentRegimenFutureDates T

    on A.TreatmenRegimenNumber = T.TreatmenRegimenNumber

    And A.RevisionNumber = T.RevisionNumber

    and T.futuredate =

    (

    select max(futuredate)

    from TreatmentRegimenFutureDates

    where futuredate ='2008-08-29 00:00:000'

    )

    and A.endDate > ='2008-08-29 00:00:000'

    Left Join Protocols P

    on a.ProtocolNumber=P.ProtocolNumber

    Left Join Contacts C

    on P.PIPennID=C.PennID

    Left Join Species Sp

    on a.SpeciesCode=SP.SpeciesCode

    left join vivarium v

    on a.VivariumCode=V.VivariumCode

    join

    (

    select ULARCode,ULARCodeDescription

    from ularcodes

    where ULARCodeType=7

    ) as s

    on A.TreatmentSheetType = s.ULARCode

    left join rooms R

    on A.RoomCardID =R.RoomCardID

    where

    a.RevisionNumber=

    (

    select max(b.RevisionNumber)

    from TreatmentRegimen b

    group by B.TreatmenRegimenNumber

    having b.TreatmenRegimenNumber=a.TreatmenRegimenNumber

    )

    and A.regimenStatus is null

    and

    isnull( A.NextInstanceDate ,T.FutureDate) >= a.startdate

    ) as tab

    where

    (Tab.TreatMentDate ='2008-08-29 00:00:000')

    and

    convert(varchar,tab.TreatmenRegimenNumber)+ ','+convert(varchar,tab.RevisionNumber)+',' + convert(varchar,Tab.TreatMentDate)

    not in /* REG/REV/DATE COMBO CAN'T ALREADY BE IN TREATMENT INSTANCES */

    (

    select convert(varchar,TreatmenRegimenNumber)+ ','+convert(varchar,RevisionNumber)+','+convert(varchar,TreatMentDate)

    from Treatmentinstances

    where treatmentdate = '2008-08-29 00:00:000'

    )

    and convert(varchar,tab.TreatmenRegimenNumber)+ ','+convert(varchar,tab.RevisionNumber)

    not in

    (

    select convert(varchar,TreatmenRegimenNumber)+ ','+ convert(varchar,RevisionNumber)

    from TreatmentRegimen TR

    join holidays h

    on TR.scheduleinstancedate = h.[date]

    and TR.holidayrule = 6

    )

    and convert(varchar,tab.TreatmenRegimenNumber)+ ','+convert(varchar,tab.RevisionNumber)

    not in

    (

    select convert(varchar,TreatmenRegimenNumber)+ ','+ convert(varchar,RevisionNumber)

    from TreatmentRegimenfuturedates

    where futuredate in

    (

    select date from holidays

    where futuredate = '2008-08-29 00:00:000'

    )

    and tab.holidayrule=6

    )

    and tab.enddate>='2008-08-29 00:00:000'

    /*NOTE: after this point additional restrictions can be added based on the search criteria from the app */

    And (tab.TreatmentDate < = '2008-08-29 00:00:000')

    And tab.ULARCode = '1'

    And tab.VivariumName = 'Richards'

    You'll notice several joins on the inner clause that don't have corresponding select columns on the outer clause - these inner joins are used by the concatenating WHERE clause based on the search criteria from the app. For example, the vivarium (v) table doesn't have any outer clause columns in the select clause...but a vivarium where clause can be added if someone limits their search by a vivarium name or code.

  • One thing that may help.

    convert(varchar,tab.TreatmenRegimenNumber)+ ','+convert(varchar,tab.RevisionNumber)

    not in

    (

    select convert(varchar,TreatmenRegimenNumber)+ ','+ convert(varchar,RevisionNumber)

    from TreatmentRegimen TR

    join holidays h

    on TR.scheduleinstancedate = h.[date]

    and TR.holidayrule = 6

    )

    The convert and concatenation means no index seeks. You may get better performance changing that to an if exists. I say may, because the exists requires a correlated subquery... Test carefully.

    EXISTS (

    select 1

    from TreatmentRegimen TR

    join holidays h

    on TR.scheduleinstancedate = h.[date]

    and TR.holidayrule = 6

    WHERE TR.TreatmenRegimenNumber = a.TreatmenRegimenNumber

    AND TR.RevisionNumber = a.RevisionNumber

    )

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I assume this is in a stored proc. Can you post the rest of the proc please?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gila,

    Ha ha!!! If only it were in an SP. This is a system I was hired to "fix" just 10 months ago. There are numerous issues with the system including limited usage of SPs. No...it resides in a VB.NET form and gets passed to the DB via ExecuteNonQuery.

    The entire process actually consists of over 10 large SQL statements. I have traced the whole process numerous times and this query is the longest, which is why I'm attacking it first.

    I, personally, would move all of these into SPs but I haven't been allocated the time.

    However, my primary issue is finding out how to debug a query that runs fast sometimes and slow other times....I'm looking more for the theoretical steps one would take rather than anything specifically related to this query.

  • Ah. Fun.

    How is the query constructed on the front end? Parmaterised (I somehow doubt) or concatenated strings?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The worst way possible! They didn't even take the time to use parameters, which isn't hard at all to do in .Net.

    I could go on a rant for hours...but hey! this is why they're paying me money, right? I just wish they'd let me rewrite a little of it.

    FYI...I did find part of the query listed above that was a duplicate and removed it...sped it up from 2 minutes to less than 1 second.

    I also added an index that converted 2 clustered index scans into index seeks.

  • Right now I'm thinking it's poossibly got something to do with compile time. Perhaps try getting the admins to run a trace on Cache hit, cache miss, cache insert, RPC complete and TSQL: Batch complete. Get, at the minimum, the start time and the duration (not all the events have duration), and then run the query from the front end app (or get someone to). Filter the trace on the machine name that you're testing from

    Do you get the sometimes fast, sometimes slow behaviour from query analyser with a single query? Is the sometimes fast, sometimes slow with the same query?

    The thing with the proc cache, around adhoc code, is that two queries must match identically, down to the white space, for SQL to consider them the same and reuse the plan. SQL probably won't auto-parameterise something that complex, so there's no chance of a benefit there.

    I'll dig up a query for the 2000 plan cache for you over the weekend, so you can see what's in the cache.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail,

    Thanks for the help.

    Whenever I run query...whether in QA or from the front-end app itself...it runs fast. I haven't been able to get the query to run slow...but I have traces that show it running slow.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply