November 19, 2012 at 7:48 am
We have a procedure which performs _NO_ update or delete operation but only read. This procedure runs daily without any issue and gets completed within 1-2 minutes but one bad day, it ran continously for over 10 hours and we had to kill it. We tried rebuilding indexes and updating statistics but of no use. We finally gave up for the day. The very next day it ran again and strangely got completed within 1-2 minutes.
I took the backup from production environment for the bad day and restored it in test environment. Again I ran that procedure and somehow found the query which was causing the issue. The query contains one CTE, one physical table and one temporary table.
While running the query, the spid was showing CXPACKET as waitresourcetype which suggests high degree of parallelism and the query worked fine with MAXDOP=0. However I didn't want to do that change as the procedure ran wonderully for all other days.
The activity montior was showing following:
exchangeEvent id=Pipe 17758cf50 WaitType=e_waitPipeGetRow nodeId=4
I removed back that MAXDOP option and rather created clustered index on the temporary table (I know it should have been there in the very first place). Now it worked perfect. But my quesiton is why the issue arouse for that one day only ? There is no 'WITH RECOMPILE' option for that procedure either. It worked perfectly fine before that day and after that day. There was no drastic change in any of the involved table's volume.
November 19, 2012 at 3:04 pm
sqlnaive (11/19/2012)
We have a procedure which performs _NO_ update or delete operation but only read. This procedure runs daily without any issue and gets completed within 1-2 minutes but one bad day, it ran continously for over 10 hours and we had to kill it. We tried rebuilding indexes and updating statistics but of no use. We finally gave up for the day. The very next day it ran again and strangely got completed within 1-2 minutes.I took the backup from production environment for the bad day and restored it in test environment. Again I ran that procedure and somehow found the query which was causing the issue. The query contains one CTE, one physical table and one temporary table.
While running the query, the spid was showing CXPACKET as waitresourcetype which suggests high degree of parallelism and the query worked fine with MAXDOP=0. However I didn't want to do that change as the procedure ran wonderully for all other days.
The activity montior was showing following:
exchangeEvent id=Pipe 17758cf50 WaitType=e_waitPipeGetRow nodeId=4
I removed back that MAXDOP option and rather created clustered index on the temporary table (I know it should have been there in the very first place). Now it worked perfect. But my quesiton is why the issue arouse for that one day only ? There is no 'WITH RECOMPILE' option for that procedure either. It worked perfectly fine before that day and after that day. There was no drastic change in any of the involved table's volume.
Hi,
At that moment there had been a high degree of parallelism or memory pressure or cpu pressure. If your stats had been updated and indexes rebuilt then you should have decreased the MAXDOP and changed the cost threshold for parallelism.
Regards
IgorMi
Igor Micev,My blog: www.igormicev.com
November 19, 2012 at 5:39 pm
SQL Server has frankly always had some internal issues with parallelizing queries. Parallelization is great for (re)building indexes, but for queries it's always had its quirks.
Unless you are absolutely sure the tables are properly indexed, use MAXDOP to drop the parallelization to a small number, 2-3, to help keep SQL from getting "lost". Of course for very large tables you'll need to bump it up some from that, but I still wouldn't let MAXDOP = 0 be specified or implied.
SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.
November 20, 2012 at 12:05 am
"""""""I removed back that MAXDOP option and rather created clustered index on the temporary table (I know it should have been there in the very first place). Now it worked perfect. But my quesiton is why the issue arouse for that one day only ? There is no 'WITH RECOMPILE' option for that procedure either. It worked perfectly fine before that day and after that day. There was no drastic change in any of the involved table's volume."""""""
As I said in the above quotes, the issue got cleared after creating clustered index on temporary table. so that might be it but I have one doubt here:
"Why it was running perfectly without any issues before and after that one bad day ?"
Secondly, when once again I removed the clustered index and ran the query, it started working fine even without clustered index. So other doubt is:
"What got changed after I created the clustered index on temporary table for once (even after trying dropping clustered index and/or dropping and recreating that temporary table) ?"
November 20, 2012 at 12:56 am
sqlnaive (11/20/2012)
"Why it was running perfectly without any issues before and after that one bad day ?"Secondly, when once again I removed the clustered index and ran the query, it started working fine even without clustered index. So other doubt is:
"What got changed after I created the clustered index on temporary table for once (even after trying dropping clustered index and/or dropping and recreating that temporary table) ?"
i dont think the restored back could make the issue reproducible , restoration often help to trace or catch the issue with the data or functionality (only available with that back up) it nothing to do with resource contention (like IO CPU etc ) which could be player/culprit in this issue .
Test , are you getting same delaey .slow response from same backed up data EVERYTIME ????????
-------Bhuvnesh----------
I work only to learn Sql Server...though my company pays me for getting their stuff done;-)
November 20, 2012 at 1:28 am
Test , are you getting same delaey .slow response from same backed up data EVERYTIME ????????
Thats what I said. I got that DB restored to my test environment and found the issue as it was.
November 20, 2012 at 9:11 am
Might have been a bad plan in cache on the day it was running bad and that plan got flushed out and a correct plan was created the next day.
you can use this to check the cached plans for the proc and how often that plan has been used.
SELECT UseCounts,RefCounts, Cacheobjtype, Objtype,
dbid AS DatabaseName, text AS SQL
FROM sys.dm_exec_cached_plans
CROSS APPLY sys.dm_exec_sql_text(plan_handle)
where text like '%<proc name>%'
ORDER BY usecounts DESC;
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply