Performance Issue

  • We have a procedure which performs _NO_ update or delete operation but only read. This procedure runs daily without any issue and gets completed within 1-2 minutes but one bad day, it ran continously for over 10 hours and we had to kill it. We tried rebuilding indexes and updating statistics but of no use. We finally gave up for the day. The very next day it ran again and strangely got completed within 1-2 minutes.

    I took the backup from production environment for the bad day and restored it in test environment. Again I ran that procedure and somehow found the query which was causing the issue. The query contains one CTE, one physical table and one temporary table.

    While running the query, the spid was showing CXPACKET as waitresourcetype which suggests high degree of parallelism and the query worked fine with MAXDOP=0. However I didn't want to do that change as the procedure ran wonderully for all other days.

    The activity montior was showing following:

    exchangeEvent id=Pipe 17758cf50 WaitType=e_waitPipeGetRow nodeId=4

    I removed back that MAXDOP option and rather created clustered index on the temporary table (I know it should have been there in the very first place). Now it worked perfect. But my quesiton is why the issue arouse for that one day only ? There is no 'WITH RECOMPILE' option for that procedure either. It worked perfectly fine before that day and after that day. There was no drastic change in any of the involved table's volume.

  • sqlnaive (11/19/2012)


    We have a procedure which performs _NO_ update or delete operation but only read. This procedure runs daily without any issue and gets completed within 1-2 minutes but one bad day, it ran continously for over 10 hours and we had to kill it. We tried rebuilding indexes and updating statistics but of no use. We finally gave up for the day. The very next day it ran again and strangely got completed within 1-2 minutes.

    I took the backup from production environment for the bad day and restored it in test environment. Again I ran that procedure and somehow found the query which was causing the issue. The query contains one CTE, one physical table and one temporary table.

    While running the query, the spid was showing CXPACKET as waitresourcetype which suggests high degree of parallelism and the query worked fine with MAXDOP=0. However I didn't want to do that change as the procedure ran wonderully for all other days.

    The activity montior was showing following:

    exchangeEvent id=Pipe 17758cf50 WaitType=e_waitPipeGetRow nodeId=4

    I removed back that MAXDOP option and rather created clustered index on the temporary table (I know it should have been there in the very first place). Now it worked perfect. But my quesiton is why the issue arouse for that one day only ? There is no 'WITH RECOMPILE' option for that procedure either. It worked perfectly fine before that day and after that day. There was no drastic change in any of the involved table's volume.

    Hi,

    At that moment there had been a high degree of parallelism or memory pressure or cpu pressure. If your stats had been updated and indexes rebuilt then you should have decreased the MAXDOP and changed the cost threshold for parallelism.

    Regards

    IgorMi

    Igor Micev,My blog: www.igormicev.com

  • SQL Server has frankly always had some internal issues with parallelizing queries. Parallelization is great for (re)building indexes, but for queries it's always had its quirks.

    Unless you are absolutely sure the tables are properly indexed, use MAXDOP to drop the parallelization to a small number, 2-3, to help keep SQL from getting "lost". Of course for very large tables you'll need to bump it up some from that, but I still wouldn't let MAXDOP = 0 be specified or implied.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • """""""I removed back that MAXDOP option and rather created clustered index on the temporary table (I know it should have been there in the very first place). Now it worked perfect. But my quesiton is why the issue arouse for that one day only ? There is no 'WITH RECOMPILE' option for that procedure either. It worked perfectly fine before that day and after that day. There was no drastic change in any of the involved table's volume."""""""

    As I said in the above quotes, the issue got cleared after creating clustered index on temporary table. so that might be it but I have one doubt here:

    "Why it was running perfectly without any issues before and after that one bad day ?"

    Secondly, when once again I removed the clustered index and ran the query, it started working fine even without clustered index. So other doubt is:

    "What got changed after I created the clustered index on temporary table for once (even after trying dropping clustered index and/or dropping and recreating that temporary table) ?"

  • sqlnaive (11/20/2012)


    "Why it was running perfectly without any issues before and after that one bad day ?"

    Secondly, when once again I removed the clustered index and ran the query, it started working fine even without clustered index. So other doubt is:

    "What got changed after I created the clustered index on temporary table for once (even after trying dropping clustered index and/or dropping and recreating that temporary table) ?"

    i dont think the restored back could make the issue reproducible , restoration often help to trace or catch the issue with the data or functionality (only available with that back up) it nothing to do with resource contention (like IO CPU etc ) which could be player/culprit in this issue .

    Test , are you getting same delaey .slow response from same backed up data EVERYTIME ????????

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Test , are you getting same delaey .slow response from same backed up data EVERYTIME ????????

    Thats what I said. I got that DB restored to my test environment and found the issue as it was.

  • Might have been a bad plan in cache on the day it was running bad and that plan got flushed out and a correct plan was created the next day.

    you can use this to check the cached plans for the proc and how often that plan has been used.

    SELECT UseCounts,RefCounts, Cacheobjtype, Objtype,

    dbid AS DatabaseName, text AS SQL

    FROM sys.dm_exec_cached_plans

    CROSS APPLY sys.dm_exec_sql_text(plan_handle)

    where text like '%<proc name>%'

    ORDER BY usecounts DESC;

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply