Is SAN admin correct? Windows 500 GB limitation on cluster failover?

Question

Is SAN admin correct? Windows 500 GB limitation on cluster failover?

NJDave

SSCommitted

Points: 1903
More actions
March 26, 2013 at 9:46 am

#275630

Hello
I was recently pulled into an ongoing project that is running into problems. The old project manager let me know that there were problems failing over more than 500 GB and the new project manager has it listed as a "Windows problem". His boss is calling it a "SQL Server problem". I heard from the old project manager that there are SAN/Hitachi replicator issues. So the storage team is looking to blame SQL somehow.
Is it true that a failover cluster has problems at the SQL or Windows level when doing a failover cluster for a database > 500 GB? How about 1TB or 10TB?
Does anyone have a good article for this - its hard to find documentation of something that is possibly not true.
Any help is appreciated.
Thanks
Dave

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

I've done a failover cluster with an instance that had 1 600GB database and 1 1TB database. No storage problems (well, other than a badly designed very slow SAN)

The size of the database isn't a factor in a cluster failover, the storage is shared, the ownership of the storage switches from one server to the other, there's nothing copied.

Could you describe the problems more?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

NJDave SSCommitted Points: 1903 More actions · Answer 2

I think it is mostly a communication issue - I'm attending a meeting this afternoon and heard different reports between the old and new project managers.

They don't have the cluster built - they are planning but I think the SAN people had problems with their hitachi replicator and trying to shift the focuse to SQL.

Or, it could have been misinterpreted by the new pm.

The statement you wrote below is what I will be armed with - I needed verifiction from people with more experience, thanks.

"The size of the database isn't a factor in a cluster failover, the storage is shared, the ownership of the storage switches from one server to the other, there's nothing copied."

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 3

Hitatchi Replicator? Are you doing a geo-dispersed cluster with non-shared storage or something? Or SAN replication to a failover SAN/DR site?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

RP_DBA SSCertifiable Points: 5116 More actions · Answer 4

It's been a while but I think we had a similar issue. Might look into this:

from dbforums.com: SQL Service starts before your SAN is available.
We had this problem, too. According to our SAN vendor, the fix is as follows:
Open iSCSI Initiator
Click the 'Bound Volumes/Devices' tab
Click 'Bind All"
Click OK.
This will force the iSCSI Initiaotr to mount all the volumes before it relinquishes control to other processes, such as SQL Server.

_____________________________________________________________________
- Nate

@nate_hughes

NJDave SSCommitted Points: 1903 More actions · Answer 5

The plan is for SAN replication to a failover SAN/DR site.

But they don't have the SAN or the power at the failover site anyway.

For now, they would set up a local cluster and the database could grow up to 10 TB.

The databases are up and running now on a single physical server. SLQ 2008 R2 Enterprise

Thanks

Dave

Kendra Little SSC Enthusiast Points: 127 More actions · Answer 6

I agree with Gail-- you can cluster very large databases (VLDBs) very successfully, and Windows Failover cluster can be great at providing high availability for these databases.

It's also possible to have a lot of issues with Windows Failover Clustering if you don't follow best practices, take shortcuts, or have issues in your network environment or storage subsystem. That's true no matter how large your database is. There are settings and configuration issues in SQL Server which can make failover slow at times, too-- it does get pretty complex.

So, in short, I would suggest:

* Making sure High availability (same datacenter) and disaster recovery (remote datacenter) requirements are defined appropriately

* Define a build and migration plan in stages with good rollback and a testing plan

* Implement everything incrementally (which it sounds like you're on track to do as it sounds like you're talking about getting HA set up in the local datacenter before moving on)

For the SAN replication itself, much depends on the version of the hardware, the type of replication (sync or async), the communication path between the datacenters, etc. It can be great, or it can cause a lot of problems depending on implementation.

Sailorking Right there with Babe Points: 739 More actions · Answer 7

SQL has zero problems failing over VLDB's and I can contest to that since I have 2 node clusters that fail over 1,380 databases ranging from 100GB to 1.5TB without a problem. Just like everyone else has said this isn't a SQL thing since files are not copied; the volumes are shared.

As for the SAN I can see this being a problem with their replication service over WAN but again it wouldn't be a file size problem but more of a LUN size problem since the replication on the SAN is done at the bit level.

We use EqualLogic and DELL switches and we replicate 9TB without a hitch so i'm guessing their talking about geo-replication over WAN or something.

my 3 cents

-King

NJDave SSCommitted Points: 1903 More actions · Answer 8

During the meetings it went from a SQL problem to a Windows problem to a 10 TB LUN problem. It seems political to me - it was easier to tell the audience that there was a windows or SQL issue rather than say he made a 10 TB LUN.

The SAN admin is pushing to have around 20 databases spread across 5 2TB LUNS than 1 10 TB LUN. It seems things wreent planned properly between the project manager, vendor, and the SAN admin - this project started without me over 2 years ago, I'm just coming into it now - and learning quickly.

Thanks for your help

Dave

sql-lover SSCoach Points: 18530 More actions · Answer 9

NJDave (3/26/2013)
Hello
I was recently pulled into an ongoing project that is running into problems. The old project manager let me know that there were problems failing over more than 500 GB and the new project manager has it listed as a "Windows problem". His boss is calling it a "SQL Server problem". I heard from the old project manager that there are SAN/Hitachi replicator issues. So the storage team is looking to blame SQL somehow.
Is it true that a failover cluster has problems at the SQL or Windows level when doing a failover cluster for a database > 500 GB? How about 1TB or 10TB?
Does anyone have a good article for this - its hard to find documentation of something that is possibly not true.
Any help is appreciated.
Thanks
Dave

That sounds familiar to me, lol ... I mean, pointing fingers that way. Do you work for the famous company that make printers and PCs? DO NOT REPLY! lol ...

MS-SQL 2008 and above (do not remember SQL 2005) does not have such limitation. You can put 32k databases if you want, but the problem is how much RAM they need, so they can run properly.

Also, I faced an issue where the SAN was able to allocate up to 500GB max only. I do not remember the specifics, but it was a SAN hardware limitation. So managing the databases was a little bit tricky as we were forced to use that Data LUN only.

Now, I am also familiarized with Veritas Cluster (not SQL failover). Because the SAN to SAN replication across regions (one was in Texas, the other one in GA I think), we limited the amount of data that we put there. But that's because the huge amount of data that has to be moved in case of a crash. However, we were able to fail-over using Veritas in a matter of minutes, which it is actually amazing good for mission critical databases.

Bottom line, most recent SQL versions do not have such limitation, but SAN and replication may affect that.