Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

SQL Server 2005 Cluster - [sqsrvres] CheckQueryProcessorAlive: sqlexecdirect failed Expand / Collapse
Author
Message
Posted Friday, October 31, 2008 6:45 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 07, 2014 12:59 PM
Points: 12, Visits: 100
Everyone --

We are still working through the issue with Microsoft and EMC. The most current is that we have been to single out a node in our cluster to be our guinea pig. As of this morning...

- We have found out that we had outdated HBA drivers for our HBA to our EMC SAN. There is also a mandatory (says EMC) patch from Microsoft required for the HBA. It should be in place now too.
- We have also put in some Server Service/LanMan reg pokes microsoft suggested.

One of the new things they found was related to this:
"The errors we’ve been getting indicate that the Server service is unable to keep up with the demand for network work items that are queued by the network layer of the input/output (IO) stream.

There are many causes some of which only cause brief logging of error conditions (but may not cause failover) and these may be addressed by tuning the server service.

Disk subsystem not being able to keep up is the most common cause of the accumulation of work items in the server service. "

He then went on to suggest this reg poke:
To increase the capacity of the server service to handle incoming IO request please set the following registry settings (Hexadecimal values) using regedit.exe:

HKLM\SYSTEM\CurrentControlSet\Services\lanmanserver\parameters
"MaxFreeConnections"=dword:00000064
"MinFreeConnections"=dword:00000020
"MaxRawWorkItems"=dword:00000200
"MaxWorkItems"=dword:00002000

--- We'll see where this takes us... I am also collecting additional information for him as well... will keep you posted.

-- Mike
Post #594918
Posted Friday, October 31, 2008 7:39 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, October 16, 2009 8:24 PM
Points: 3, Visits: 35
Thanks for your continued efforts on this. The registry fixes I posted appear to prevent the unneeded SQL Cluster resource failures and the attendant 19019 errors, but I have suspected the underlying issue has not been resolved.

We have systems using HP and 3PAR based SANs that have been effected by this issue (We may have some systems attached to EMC SANS that are affected but I am unaware of any at this time). With the application of the fix recommended by Microsoft tech the Cluster resource failures have stopped and therefore the Clusters are no longer on the front burner, as it were.

I am continuing to see evidence of Disk subsystem issues (VSS & VDS errors which are interrupting backups)

Systems have been checked against the respective SAN configuration Matrixes for HBA drivers/firmware, MPIO etc etc

We have been applying http://support.microsoft.com/kb/943295 against some of the effected systems and the Jury is still out, but I have the feeling that we are not out of the woods regards this issue.

I will be paying close attention to this thread.
Post #594964
Posted Monday, November 03, 2008 11:52 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 07, 2014 12:59 PM
Points: 12, Visits: 100
Everyone --

Still not out of the woods yet. This is becoming a long (and painful) process. I've tried everything Microsoft has suggested and still nothing.

As for Hotfixes, Thomas mentioned that he applied 943295. EMC Told us to put in Hotfix 943545... which I'm assuming is a newer fix to the one Thomas put in ?

Today thought that possibly the Communication link Failures I'm getting *may* be fixed by CU10.... yes, they're up to CU10 for SP2. But I think I disproved that this afternoon.

I re-ran the process on another SQL2005-x64 machine that is SAN connected but not in a cluster... ran clean as a whistle.

I'll keep you posted.
-- Mike
Post #596005
Posted Tuesday, November 18, 2008 9:41 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 07, 2014 12:59 PM
Points: 12, Visits: 100
Everyone --

Wanted to check back in... we think we got it. It'll take a few days to pull everything together what we did and was microsoft recommended... but the tweak that has seemed to nail for us was that we made (in hindsight not so smart) mistake of SQL Server Priority Boost checked on the nodes in our cluster.

If any of you following this have done the same... uncheck it as soon as reasonbly possible. Again, in hindsight, If google around long enough you'll hit the articles that say you shouldn't haven't have this checked in a cluster... but it only says it could cause networking problems, but no details or specific messages...

In our case, we got to a point to where we stripped a node down to it's bare bones... we uninstalled EVERYTHING on the node that wasn't critical and (with boost on) ran a test where I would run a procedure that reliably causes the 19019 events, and profiler. SQL Profiler would see that the cluster service would get periodically dropped as a connection from SQL Server. THIS DROP is what was generating the 19019 errors! On a hunch, one of DBA's thought that *maybe* this priority boost thing might be choking other processes on the node... the cluster service being one of them... to give way to the higher-priority SQL Server.

Sure enough, we switched off this setting... not a single doggone 19019 since.

Like I said, I will try to get back to you folks within a few days, maybe a week, to compile everything we tried and all of Microsoft's recommendations based on our particular environment. In short, in no particular, things Microsoft sited in our environment:
1. Spikes in disk activity on our SAN (got better after applying latest drivers/hotfixes)
2. They claimed that our NIC cards in our nodes were teamed, which is a cluster no no (our cards were not teamed, period.)
3. They tried to reference an obscure match with Quest's SQL Litespeed causing the problem when using native command substitution... not buying this one... we've has litespeed for years and have always used teh xp_ procs... not command substitution
4. The suggested that we look at / play with our MAXDOP options... current set to 0 on each of our nodes. (this was suggested after we mentioned to them about us stumbling upon the priority boost thing).


Take it easy -- Mike
Post #604835
Posted Wednesday, December 03, 2008 2:13 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Wednesday, May 23, 2012 10:27 PM
Points: 87, Visits: 161
Hi

I have a newly build sql server cluster and getting these errors with no application volume at all.



Post #612623
Posted Friday, December 12, 2008 1:20 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, March 20, 2012 7:10 PM
Points: 4, Visits: 144
Hi my name is Fabio Pereira, i'm brazilian, please you solved this problem, with the modification?

Thank.
Post #618902
Posted Saturday, December 13, 2008 2:58 PM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Monday, April 14, 2014 2:01 PM
Points: 2,007, Visits: 6,065
Prakash.Bhojegowda (5/7/2008)
I called Microsoft and was on phone with them for 6 hours yesterday. They were not able to give me an explanantion. They turned around and said that, this is the way, SQL 2005 is designed to work.


I still do not agree with Microsoft because, one of my friend who works for another IT firm do have SQL 2005 and he says that SQL 2005 should allocate all the available memory. If maximum 14 gigs of memory is configured on SQL SERVER, SQL should utlize every bit of it.

If any one has corrected the issue, please let me know, i shall be eager to know the solution.


Your friend is correct within certain contexts.

If you are using a 32bit version of SQL2005 then it will not use all the memory should PAE and AWE not be enabled.
On a 64bit system it will use all of the memory, however will not take all of that memory immediately, it will start off small and then ramp up as memory is needed, up to the maximum that it is allocated.




Shamless self promotion - read my blog http://sirsql.net
Post #619161
Posted Friday, February 20, 2009 2:55 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Friday, March 12, 2010 8:40 AM
Points: 190, Visits: 472
Still looking for some answers on this? anyone figure it out?


Post #661809
Posted Thursday, August 13, 2009 8:30 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, December 08, 2013 2:59 PM
Points: 4, Visits: 152
Any one have a solution for this issue.
Please reply..
Thanks in advance
Post #770202
Posted Friday, September 04, 2009 7:37 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Friday, July 22, 2011 4:13 AM
Points: 1,149, Visits: 603
Hi
We had a problem almost the same - Looked at CPU affinity, network cards etc.

In the end I found it was because Priority Boost was enabled on the installation (it was already there and the server failing before I arrived.)

Once I set this to 0 the mysterious reboots ended - along with the event viewer errors which used to happen 2 or 3 times a day.

Priority Boost changes require a service restart before they take affect.

Hope this help some of you

Seth

(actually just noticed someone else has said the same thing a few posts earlier - I'll leave this for those like me who get forum thread blindness)
Post #782893
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse