I've got a problem intermittent Connectivity/DB availability issue
Here's quick low-down on the system
Windows 2008 R2 SP1
SQL 2005 SP4 64 bit
Physical machine 48 GB Ram 16 Intel Cores 2.9G
SAN attached storage.
Mirroing and LS in use.
DB is used to manage application logons.
Application uses classic ASP
Combination of pooled and non-pooled connections
There are no interactive users to the DB
Generally usage occurs at set times of the day.
Intermittently, the db seems to become unavailable to the application.
Users cant login and start the application
Application is hosted across a number of servers and the connection error will appear on logs all of the app servers.
COM exception had occured in InitADODBConnection: ([Microsoft][ODBC SQL Server Driver]Timeout expired) Retrying 1 time...
COM Exception had occured in ProcessAudioNotificationRecords: [Microsoft][ODBC SQL Server Driver]Timeout expired
COM exception had occured: Operation is not allowed when the object is closed.
COM exception had occured in InitADODBConnection: ([Microsoft][ODBC SQL Server Driver][DBNETLIB]General network error. Check your network documentation.)
In the SQL Server Log we see
Error: 18456, Severity: 14, State: 27.
Then a number of these
Error: 18056, Severity: 20, State: 27.
The client was unable to reuse a session with SPID 446, which had been reset for connection pooling. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message
Shortly before (say 2 mins) those errors On the SQL Server, I see the following in perfmon (I have a lot more data gathered so please ask)
Rapid rampup in user connections from ~500 to ~1100 (in 2 mins), plateauing for a 1min or so, then on to some higher value 1800+, plateauing for another
few mins before dropping back to ~500 by which time the outage is over.
Rapid decrease in bytes received & sent
Rapid decrease in Batch Requests/sec
No change in Total or Target Server memory
No change in the continual increase PLE
Rapid increase in Reserved Pages
Increase in %processor time from 5 to 10%
I've got profiler data that seems shows some activity during the outage but desipted the increase in user connection no new users can login to the application.
The system invariably sorts its self out and goes back to normal. The duration of these can be from less than a 1 min to 10.
I've been looking at this for quite a while can't definitively point the finger at where the problem is and could really do with fresh opinions/ questions. As I said, I've got quite a bit of data from perfmon/profiler and the DMVs so please ask for whatever may help.