Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Is SAN admin correct? Windows 500 GB limitation on cluster failover? Expand / Collapse
Author
Message
Posted Tuesday, March 26, 2013 9:46 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, October 16, 2014 7:02 AM
Points: 83, Visits: 509
Hello

I was recently pulled into an ongoing project that is running into problems. The old project manager let me know that there were problems failing over more than 500 GB and the new project manager has it listed as a "Windows problem". His boss is calling it a "SQL Server problem". I heard from the old project manager that there are SAN/Hitachi replicator issues. So the storage team is looking to blame SQL somehow.

Is it true that a failover cluster has problems at the SQL or Windows level when doing a failover cluster for a database > 500 GB? How about 1TB or 10TB?

Does anyone have a good article for this - its hard to find documentation of something that is possibly not true.

Any help is appreciated.

Thanks
Dave
Post #1435583
Posted Tuesday, March 26, 2013 9:49 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 1:19 AM
Points: 40,205, Visits: 36,608
I've done a failover cluster with an instance that had 1 600GB database and 1 1TB database. No storage problems (well, other than a badly designed very slow SAN)

The size of the database isn't a factor in a cluster failover, the storage is shared, the ownership of the storage switches from one server to the other, there's nothing copied.

Could you describe the problems more?



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1435586
Posted Tuesday, March 26, 2013 9:56 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, October 16, 2014 7:02 AM
Points: 83, Visits: 509
I think it is mostly a communication issue - I'm attending a meeting this afternoon and heard different reports between the old and new project managers.

They don't have the cluster built - they are planning but I think the SAN people had problems with their hitachi replicator and trying to shift the focuse to SQL.

Or, it could have been misinterpreted by the new pm.

The statement you wrote below is what I will be armed with - I needed verifiction from people with more experience, thanks.

"The size of the database isn't a factor in a cluster failover, the storage is shared, the ownership of the storage switches from one server to the other, there's nothing copied."


Post #1435589
Posted Tuesday, March 26, 2013 9:59 AM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 1:19 AM
Points: 40,205, Visits: 36,608
Hitatchi Replicator? Are you doing a geo-dispersed cluster with non-shared storage or something? Or SAN replication to a failover SAN/DR site?


Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1435595
Posted Tuesday, March 26, 2013 10:08 AM


SSChasing Mays

SSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing MaysSSChasing Mays

Group: General Forum Members
Last Login: Thursday, October 2, 2014 12:30 PM
Points: 620, Visits: 867
It's been a while but I think we had a similar issue. Might look into this:

from dbforums.com: SQL Service starts before your SAN is available.

We had this problem, too. According to our SAN vendor, the fix is as follows:
Open iSCSI Initiator
Click the 'Bound Volumes/Devices' tab
Click 'Bind All"
Click OK.

This will force the iSCSI Initiaotr to mount all the volumes before it relinquishes control to other processes, such as SQL Server.



_____________________________________________________________________
- Nate

@nate_hughes
Post #1435597
Posted Tuesday, March 26, 2013 10:17 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, October 16, 2014 7:02 AM
Points: 83, Visits: 509
The plan is for SAN replication to a failover SAN/DR site.

But they don't have the SAN or the power at the failover site anyway.

For now, they would set up a local cluster and the database could grow up to 10 TB.

The databases are up and running now on a single physical server. SLQ 2008 R2 Enterprise

Thanks
Dave
Post #1435602
Posted Tuesday, March 26, 2013 10:47 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, September 27, 2014 4:10 PM
Points: 6, Visits: 220
I agree with Gail-- you can cluster very large databases (VLDBs) very successfully, and Windows Failover cluster can be great at providing high availability for these databases.

It's also possible to have a lot of issues with Windows Failover Clustering if you don't follow best practices, take shortcuts, or have issues in your network environment or storage subsystem. That's true no matter how large your database is. There are settings and configuration issues in SQL Server which can make failover slow at times, too-- it does get pretty complex.

So, in short, I would suggest:
* Making sure High availability (same datacenter) and disaster recovery (remote datacenter) requirements are defined appropriately
* Define a build and migration plan in stages with good rollback and a testing plan
* Implement everything incrementally (which it sounds like you're on track to do as it sounds like you're talking about getting HA set up in the local datacenter before moving on)

For the SAN replication itself, much depends on the version of the hardware, the type of replication (sync or async), the communication path between the datacenters, etc. It can be great, or it can cause a lot of problems depending on implementation.
Post #1435616
Posted Wednesday, April 3, 2013 2:20 AM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, September 30, 2014 10:10 AM
Points: 15, Visits: 126
SQL has zero problems failing over VLDB's and I can contest to that since I have 2 node clusters that fail over 1,380 databases ranging from 100GB to 1.5TB without a problem. Just like everyone else has said this isn't a SQL thing since files are not copied; the volumes are shared.

As for the SAN I can see this being a problem with their replication service over WAN but again it wouldn't be a file size problem but more of a LUN size problem since the replication on the SAN is done at the bit level.

We use EqualLogic and DELL switches and we replicate 9TB without a hitch so i'm guessing their talking about geo-replication over WAN or something.

my 3 cents

-King
Post #1438201
Posted Wednesday, April 3, 2013 6:34 AM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Thursday, October 16, 2014 7:02 AM
Points: 83, Visits: 509
During the meetings it went from a SQL problem to a Windows problem to a 10 TB LUN problem. It seems political to me - it was easier to tell the audience that there was a windows or SQL issue rather than say he made a 10 TB LUN.

The SAN admin is pushing to have around 20 databases spread across 5 2TB LUNS than 1 10 TB LUN. It seems things wreent planned properly between the project manager, vendor, and the SAN admin - this project started without me over 2 years ago, I'm just coming into it now - and learning quickly.

Thanks for your help
Dave
Post #1438294
Posted Thursday, April 4, 2013 2:54 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: 2 days ago @ 8:53 AM
Points: 411, Visits: 1,310
NJDave (3/26/2013)
Hello

I was recently pulled into an ongoing project that is running into problems. The old project manager let me know that there were problems failing over more than 500 GB and the new project manager has it listed as a "Windows problem". His boss is calling it a "SQL Server problem". I heard from the old project manager that there are SAN/Hitachi replicator issues. So the storage team is looking to blame SQL somehow.

Is it true that a failover cluster has problems at the SQL or Windows level when doing a failover cluster for a database > 500 GB? How about 1TB or 10TB?

Does anyone have a good article for this - its hard to find documentation of something that is possibly not true.

Any help is appreciated.

Thanks
Dave


That sounds familiar to me, lol ... I mean, pointing fingers that way. Do you work for the famous company that make printers and PCs? DO NOT REPLY! lol ...

MS-SQL 2008 and above (do not remember SQL 2005) does not have such limitation. You can put 32k databases if you want, but the problem is how much RAM they need, so they can run properly.

Also, I faced an issue where the SAN was able to allocate up to 500GB max only. I do not remember the specifics, but it was a SAN hardware limitation. So managing the databases was a little bit tricky as we were forced to use that Data LUN only.

Now, I am also familiarized with Veritas Cluster (not SQL failover). Because the SAN to SAN replication across regions (one was in Texas, the other one in GA I think), we limited the amount of data that we put there. But that's because the huge amount of data that has to be moved in case of a crash. However, we were able to fail-over using Veritas in a matter of minutes, which it is actually amazing good for mission critical databases.

Bottom line, most recent SQL versions do not have such limitation, but SAN and replication may affect that.
Post #1439022
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse