We have three load-balanced SQL Servers with a large (>100 million rows) table on each of them. The table is used only for lookups (although the database is not read-only). The code base is all legacy code, some of it quite old (SQL 7.0 or earlier, maybe).
When one of the three servers is taken offline, the other two servers immediately go to 100% CPU utilization and the number of active transactions rises from about 7-10 per server to in excess of 100-200 per server. All queries after that start timing out and our web site is effectively "down" until the situation is remedied (usually by placing the third server back into the load balancing system).
I'm at a loss with this problem. The servers are not normally CPU bound - in fact CPU usage normally is low (processing is usually I/O bound). Is it a locking issue? Lack of memory? Gremlins?
Does anyone have any ideas on what to monitor to try and isolate this issue?
Thanks in advance....