Very strange hanging process...

  • I apologize in advance for this being vague, but it would take some time to include real code - there is a TON of it and some things would need to be masked out.  So I am going to post this as a very general "throw out some ideas" kind of question.

    We have a nightly process in our Great Plains system that basically takes a snapshot of the current historic aged trial balance and dumps the results into a table.  It usually runs in under a minute, but recently we've had some failures due to the job hanging and running for hours - we typically end up killing it after 24 hours.

    Here is the weird part...  If I run the exact same code again, it often finishes again in under a minute.  Which, okay... it could be cached or whatever; but if we tweak the date parameters, we can get it to alternate between running normally (under a minute) to hanging - we generally kill it after 10 minutes.

    Every time it hangs, it's hanging on an update statement against temp tables created by the stored proc, with a wait type of CXCONSUMER.

    Our first guess was some kind of bad data... but that doesn't seem to match because it should always fail.  Our next guess was some other process clashing with it...  but the thing is, I can replicate the problem on our reporting server, which is an hourly log-shipped copy of production.  There are no processes on the reporting server, and the databases are read-only.

    My guess - based on the CXCONSUMER - is that it's resources.  But I can't pinpoint anything that would be causing the issue on two different boxes.   Both servers are pretty robust in terms of CPU, RAM, etc.

    The stored procedure has parameters to allow us to provide a range of customer names - the nightly job leaves these blank, which means do all.  We can run "D" through "Z" without an issue.  If we try "0" (zero) through "D" it hangs.  But we can do "0" - "B", "C"-"D" and it works.

    Again, I realize this is all very vague...  sorry about that...

  • It sounds like you know what you're doing, so I suspect you'll have considered this already, but could it be down to parameter sniffing?

    If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.
    See https://www.sqlservercentral.com/articles/forum-etiquette-how-to-post-datacode-on-a-forum-to-get-the-best-help/ for details of how to post T-SQL code-related questions.

  • Another thought (as I ran into a similar issue when working with Great Pains... I mean Great Plains) - try setting your MAXDOP on that query to 1.  We had a similar thing where the query would run in under a minute in MOST cases, but every now and then would run for 7 hours with CXCONSUMER wait types.

    We added a query hint of MAXDOP 1 and it fixed the issue for us.

  • CXCONSUMER is waiting on parallelism.

    Have you adjusted the SQL Server configuration for 'cost threshold for parallelism' to be something larger than the default?  The default is 5, IIRC, which is wayyy too low, because it makes SQL too aggressively use parallel when it's not really needed and just causes unnecessary overhead.  Most people use a setting around 50, although sometimes I've seen as high as 300 used on some systems.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

  • Appreciate the responses 🙂

    I had discounted parameter sniffing (perhaps unwisely) because the same job has been running for over ten years on five different GP databases without this issue.  One of them is much larger, and has a greater variety of data, so it seemed like it should be the one affected instead of this one.

    I did think of MAXDOP but was unconsciously avoiding that one because the process in question covers multiple stored procedures that generate dynamic SQL against temp tables - have I mentioned I hate this process?? - however...

    The "cost threshold for parallelism" seems to have done the trick.  At least, in our test environment I've run the thing several times after bumping that from 5 to 50 and it's run faster than it usually does.  So that could very well be the fix...  I kind of wish I knew why it became an issue all of a sudden, but I'll do some further testing and see how it goes.

    Thanks, guys...  especially given the somewhat lacking information I provided.

  • Just to follow up, that setting is a life saver.

    The process I was having issues with is now running at around 40 seconds consistently, it used to be around a minute.  We haven't had it hang since the change.

  • Yeah.  I'm not sure why MS hasn't changed the SQL Server default value for that setting.  They've made other adjustments to critical starting values, particularly for tempdb, but somehow haven't adjusted that one yet.

    SQL DBA,SQL Server MVP(07, 08, 09) Prosecutor James Blackburn, in closing argument in the Fatal Vision murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply