Is SAN admin correct? Windows 500 GB limitation on cluster failover?

  • Hello

    I was recently pulled into an ongoing project that is running into problems. The old project manager let me know that there were problems failing over more than 500 GB and the new project manager has it listed as a "Windows problem". His boss is calling it a "SQL Server problem". I heard from the old project manager that there are SAN/Hitachi replicator issues. So the storage team is looking to blame SQL somehow.

    Is it true that a failover cluster has problems at the SQL or Windows level when doing a failover cluster for a database > 500 GB? How about 1TB or 10TB?

    Does anyone have a good article for this - its hard to find documentation of something that is possibly not true.

    Any help is appreciated.

    Thanks

    Dave

  • I've done a failover cluster with an instance that had 1 600GB database and 1 1TB database. No storage problems (well, other than a badly designed very slow SAN)

    The size of the database isn't a factor in a cluster failover, the storage is shared, the ownership of the storage switches from one server to the other, there's nothing copied.

    Could you describe the problems more?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I think it is mostly a communication issue - I'm attending a meeting this afternoon and heard different reports between the old and new project managers.

    They don't have the cluster built - they are planning but I think the SAN people had problems with their hitachi replicator and trying to shift the focuse to SQL.

    Or, it could have been misinterpreted by the new pm.

    The statement you wrote below is what I will be armed with - I needed verifiction from people with more experience, thanks.

    "The size of the database isn't a factor in a cluster failover, the storage is shared, the ownership of the storage switches from one server to the other, there's nothing copied."

  • Hitatchi Replicator? Are you doing a geo-dispersed cluster with non-shared storage or something? Or SAN replication to a failover SAN/DR site?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It's been a while but I think we had a similar issue. Might look into this:

    from dbforums.com: SQL Service starts before your SAN is available.

    We had this problem, too. According to our SAN vendor, the fix is as follows:

    Open iSCSI Initiator

    Click the 'Bound Volumes/Devices' tab

    Click 'Bind All"

    Click OK.

    This will force the iSCSI Initiaotr to mount all the volumes before it relinquishes control to other processes, such as SQL Server.

    _____________________________________________________________________
    - Nate

    @nate_hughes
  • The plan is for SAN replication to a failover SAN/DR site.

    But they don't have the SAN or the power at the failover site anyway.

    For now, they would set up a local cluster and the database could grow up to 10 TB.

    The databases are up and running now on a single physical server. SLQ 2008 R2 Enterprise

    Thanks

    Dave

  • I agree with Gail-- you can cluster very large databases (VLDBs) very successfully, and Windows Failover cluster can be great at providing high availability for these databases.

    It's also possible to have a lot of issues with Windows Failover Clustering if you don't follow best practices, take shortcuts, or have issues in your network environment or storage subsystem. That's true no matter how large your database is. There are settings and configuration issues in SQL Server which can make failover slow at times, too-- it does get pretty complex.

    So, in short, I would suggest:

    * Making sure High availability (same datacenter) and disaster recovery (remote datacenter) requirements are defined appropriately

    * Define a build and migration plan in stages with good rollback and a testing plan

    * Implement everything incrementally (which it sounds like you're on track to do as it sounds like you're talking about getting HA set up in the local datacenter before moving on)

    For the SAN replication itself, much depends on the version of the hardware, the type of replication (sync or async), the communication path between the datacenters, etc. It can be great, or it can cause a lot of problems depending on implementation.

  • SQL has zero problems failing over VLDB's and I can contest to that since I have 2 node clusters that fail over 1,380 databases ranging from 100GB to 1.5TB without a problem. Just like everyone else has said this isn't a SQL thing since files are not copied; the volumes are shared.

    As for the SAN I can see this being a problem with their replication service over WAN but again it wouldn't be a file size problem but more of a LUN size problem since the replication on the SAN is done at the bit level.

    We use EqualLogic and DELL switches and we replicate 9TB without a hitch so i'm guessing their talking about geo-replication over WAN or something.

    my 3 cents

    -King

  • During the meetings it went from a SQL problem to a Windows problem to a 10 TB LUN problem. It seems political to me - it was easier to tell the audience that there was a windows or SQL issue rather than say he made a 10 TB LUN.

    The SAN admin is pushing to have around 20 databases spread across 5 2TB LUNS than 1 10 TB LUN. It seems things wreent planned properly between the project manager, vendor, and the SAN admin - this project started without me over 2 years ago, I'm just coming into it now - and learning quickly.

    Thanks for your help

    Dave

  • NJDave (3/26/2013)


    Hello

    I was recently pulled into an ongoing project that is running into problems. The old project manager let me know that there were problems failing over more than 500 GB and the new project manager has it listed as a "Windows problem". His boss is calling it a "SQL Server problem". I heard from the old project manager that there are SAN/Hitachi replicator issues. So the storage team is looking to blame SQL somehow.

    Is it true that a failover cluster has problems at the SQL or Windows level when doing a failover cluster for a database > 500 GB? How about 1TB or 10TB?

    Does anyone have a good article for this - its hard to find documentation of something that is possibly not true.

    Any help is appreciated.

    Thanks

    Dave

    That sounds familiar to me, lol ... I mean, pointing fingers that way. Do you work for the famous company that make printers and PCs? DO NOT REPLY! lol ...

    MS-SQL 2008 and above (do not remember SQL 2005) does not have such limitation. You can put 32k databases if you want, but the problem is how much RAM they need, so they can run properly.

    Also, I faced an issue where the SAN was able to allocate up to 500GB max only. I do not remember the specifics, but it was a SAN hardware limitation. So managing the databases was a little bit tricky as we were forced to use that Data LUN only.

    Now, I am also familiarized with Veritas Cluster (not SQL failover). Because the SAN to SAN replication across regions (one was in Texas, the other one in GA I think), we limited the amount of data that we put there. But that's because the huge amount of data that has to be moved in case of a crash. However, we were able to fail-over using Veritas in a matter of minutes, which it is actually amazing good for mission critical databases.

    Bottom line, most recent SQL versions do not have such limitation, but SAN and replication may affect that.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply