Very strange hanging process...

Question

Very strange hanging process...

cphite

SSCarpal Tunnel

Points: 4439
More actions
May 5, 2021 at 3:21 pm

#3879648

I apologize in advance for this being vague, but it would take some time to include real code - there is a TON of it and some things would need to be masked out. So I am going to post this as a very general "throw out some ideas" kind of question.
We have a nightly process in our Great Plains system that basically takes a snapshot of the current historic aged trial balance and dumps the results into a table. It usually runs in under a minute, but recently we've had some failures due to the job hanging and running for hours - we typically end up killing it after 24 hours.
Here is the weird part... If I run the exact same code again, it often finishes again in under a minute. Which, okay... it could be cached or whatever; but if we tweak the date parameters, we can get it to alternate between running normally (under a minute) to hanging - we generally kill it after 10 minutes.
Every time it hangs, it's hanging on an update statement against temp tables created by the stored proc, with a wait type of CXCONSUMER.
Our first guess was some kind of bad data... but that doesn't seem to match because it should always fail. Our next guess was some other process clashing with it... but the thing is, I can replicate the problem on our reporting server, which is an hourly log-shipped copy of production. There are no processes on the reporting server, and the databases are read-only.
My guess - based on the CXCONSUMER - is that it's resources. But I can't pinpoint anything that would be causing the issue on two different boxes. Both servers are pretty robust in terms of CPU, RAM, etc.
The stored procedure has parameters to allow us to provide a range of customer names - the nightly job leaves these blank, which means do all. We can run "D" through "Z" without an issue. If we try "0" (zero) through "D" it hangs. But we can do "0" - "B", "C"-"D" and it works.
Again, I realize this is all very vague... sorry about that...

Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247202 More actions · Answer 1

It sounds like you know what you're doing, so I suspect you'll have considered this already, but could it be down to parameter sniffing?

Mr. Brian Gale SSC-Insane Points: 24989 More actions · Answer 2

Another thought (as I ran into a similar issue when working with Great Pains... I mean Great Plains) - try setting your MAXDOP on that query to 1. We had a similar thing where the query would run in under a minute in MOST cases, but every now and then would run for 7 hours with CXCONSUMER wait types.

We added a query hint of MAXDOP 1 and it fixed the issue for us.

The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 3

CXCONSUMER is waiting on parallelism.

Have you adjusted the SQL Server configuration for 'cost threshold for parallelism' to be something larger than the default? The default is 5, IIRC, which is wayyy too low, because it makes SQL too aggressively use parallel when it's not really needed and just causes unnecessary overhead. Most people use a setting around 50, although sometimes I've seen as high as 300 used on some systems.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

cphite SSCarpal Tunnel Points: 4439 More actions · Answer 4

Appreciate the responses 🙂

I had discounted parameter sniffing (perhaps unwisely) because the same job has been running for over ten years on five different GP databases without this issue. One of them is much larger, and has a greater variety of data, so it seemed like it should be the one affected instead of this one.

I did think of MAXDOP but was unconsciously avoiding that one because the process in question covers multiple stored procedures that generate dynamic SQL against temp tables - have I mentioned I hate this process?? - however...

The "cost threshold for parallelism" seems to have done the trick. At least, in our test environment I've run the thing several times after bumping that from 5 to 50 and it's run faster than it usually does. So that could very well be the fix... I kind of wish I knew why it became an issue all of a sudden, but I'll do some further testing and see how it goes.

Thanks, guys... especially given the somewhat lacking information I provided.

cphite SSCarpal Tunnel Points: 4439 More actions · Answer 5

Just to follow up, that setting is a life saver.

The process I was having issues with is now running at around 40 seconds consistently, it used to be around a minute. We haven't had it hang since the change.

ScottPletcher SSC Guru Points: 101223 More actions · Answer 6

Yeah. I'm not sure why MS hasn't changed the SQL Server default value for that setting. They've made other adjustments to critical starting values, particularly for tempdb, but somehow haven't adjusted that one yet.

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".