I am preparing to build a new SQL Server 2008 Enterprise server. The machine is very well equipped (Windows 2008 Enterprise, 64GB RAM, 16-core processors, etc). The system volume is a local hard drive, but the data volume is a 300GB iSCSI LUN. This is all for a set of single-purpose databases, for a LOB web app that has been performing poorly.
So here is my question: What is a best practice for locating your system databases on iSCSI LUNs?
I am somewhat suspicious of iSCSI LUNs. In the past I have seen that if you put databases on the LUN and then lose connectivity with an LUN, even for a few seconds, it can cause the SQL service to crash and not recover.
In my case, it was all the databases, even the system databases, sitting out on the LUNs. I had used the NetApp SnapManager for SQL and it had moved everything out to the SAN. So when someone accidentally unplugged the switch, SQL tried to restart the service in three quick tries, then gave up. When the LUN reconnected, SQL Server was already halted. When I manually started the service, everything came up just fine, but it gave me heartburn. This is hardly a high-availability solution.
Today I am looking at a server in an enterprise. We have redundant, round-robin iSCSI paths, multiple NICs, and backup power on switches and the NetApp. The server is about ready for me to install SQL Server. But looking at the logs, I see that a couple of Windows updates installed on the 15th at 3AM... and there dozens of iSCSIPrt errors to accompany it. Apparently whatever update was being applied caused the connections on each NIC to go down for a second or two.
I am worried that this could happen again and, even with all the redundancy we planned, it could still kill my SQL Server service.
This is a government website and they are serious about HA. So what do I do here? I have several ideas:
1. If I locate the system databases on the local C: drive, will the SQL Service will recover if the LUN is disconnected and reconnected?
2. Let's assume #1 is true, and the SQL Service will keep running. If the LUN containing the databases disappears, can this result in data loss, corruption, torn pages, etc?
I am thinking that the likelihood of a complete loss of connectivity is now very unlikely, since we have physical redundancy all the way to the SAN. But the errors from the 15th point out that a simple Windows Update could potentially interrupt a LUN connection and cause problems for me.
What do you think I should do?