This thread have saved our SQL cluster did the below seetings and it have fixed the issue, at least one week without problems, and they usually were there every second day or so. Our configuration was like so many others here with large memeory:
4 node SQL 2005 cluster config with 5 instances
All cluster nodes are 4 Dual core CPU HP bl45's with 50GB memory
I also had a ticket open with MS and they did not lead me to this fix, and they led me down the firmwre and driver road first so I upgraded to Firmware 8.03 and HP PSP 8.30 beforefinding this fix. Of cause firmware and drivers, did not fix the problem.
We have resolved this issue on a couple of Clusters by setting the following in Registry on the nodes(After having updated service packs, driver, firmware and disabling TCP/IP offload)
1. Set TcpMaxDataRetransmissions to 30 (decimal);
2. Set KeepAliveInterval to 25000 (decimal).
It worked for us you mileage may.