The Best High Availability

  • Comments posted to this topic are about the item The Best High Availability

  • We have medium-high availability. Our db servers run fibre channel over to our EVA, which is a single-point of failure. (I know it's got redundant this and that.) If a box goes belly up, we created a script that transfers all of the storage from our production box to a test box. (All db servers are identical.) Total down time should be <5 minutes. That includes detecting the problem.

    Our investment accounting system runs on a cluster, serviced by the same EVA. The amusing thing is, if a cluster node goes down and the db needs to migrate to the new node, SQL Server will stop for a couple of seconds (as expected), the problem is... The IA system is so well written, that if SQL server goes down, it'll puke and die, but not before corrupting the databases that support it. I could go into detail as to why, but let's just say they (the vendor) do not believe in such trivial things as transactional integrity, indexing schemes, data validation, stored procedures, system views (why bother when you can update the system tables directly?)... You get the idea.

    I like the idea of our current environment, along with the addition of log shipping down to our DR site. :w00t: We're on a 15 minute lag, so for us it's not bad. After all, we're not Wal-mart, or Amazon...

    (Steve, nice new pic... Shamed me into changing mine.)

    Honor Super Omnia-
    Jason Miller

  • Our old 2000 server was clustered and was problematic at best due especially due to the shared drive requirement. Drive hiccups on the RAID box were our most common problem.

    For 2005 we implemented mirroring which so far has been satisfactory. Since the drives are on two physically separate systems, we can even bring down the drives for maintenance (firmware etc) while keeping the database up.

    ...

    -- FORTRAN manual for Xerox Computers --

  • I have a general question - can virtualization play a part in HA? In other words, is it possible to have a setup where a SQL cluster is not needed if virtual servers are used?

    If so, are there sites or articles someone can point me to for discussions and pointers?

    If not, are there still ways to make use of virtualization of SQL servers?

    Thanks for any advice.

    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner (4/28/2009)


    are there still ways to make use of virtualization of SQL servers?

    We have a couple of VM servers that house production SQL environments. Our corporate accounting system, a bug tracker, and an on-site inventory management system are all on VM SQL... That works well for us for they are very low usage/volumes. The corporate accounting system is the biggest with a little over 1GB...

    That being said, I am familiar with a company that is running a 660GB email tracking system on a virtualized SQL box. Its performance is pretty abysmal. I wouldn't recommend any system more complicated than a recipe look-up on a virtualized box.

    Honor Super Omnia-
    Jason Miller

  • Virtualization definitely can, and I wouldn't be surprised to see more of this.

    You can cluster two virtual servers as long as they can see the same shared disk. So the underlying Windows host(s) need to be able to share a disk somehow.

  • It is OK to VM the SQL Server instances but I wouldn't vm the data with it. The nodes need to see the same disks (as Steve has said).

    Still VM is some order slower than a dedicated box, as everything is being shared from network connections to CPU cycles. Some intelligent analysis and testing would be in order. I wouldn't use it for HA systems unless those systems are strictly lookup systems with DBs that can reside in memory, or are extremely focused and minimal in overhead.

  • Just thought of another way to put it... I personally view VM technology as optimum when the O/S is merely an interface.

    For example, it is great in dev-test environments, where testing against multiple systems and platforms is needed.

    It is great for small web-hosting instances too, where each instance can have 'root-like" access to their own instance.

    It is still not great for databases (especially large implementations), nor integration, report, and analysis servers.

  • I'm still wondering which HA scenarios clusters can't solve, Steve. Please elaborate... Offsite DR doesn't quite fit as a typical HA scenario, but it's still doable with dark fiber and the right SAN gear.

    The "best" HA will be a layered approach anyway. Mirroring or clustering for "immediate" HA (such as hardware hiccoughs) combined with log shipping for warm standby at a secondary NOC (to deal with minor disasters, internet service outages, failure to pay the light bill) and offsite backups on removeable SSDs or DVDs in another region or country for recovery from true disasters.

  • We implement HA in various ways here. We have some clusters of SQL 2005 on Windows 2003 which means we cannot geographically separate the servers. We use Log Shipping on some databases and Database Mirroring on others. In one particular case we mirror a database that is on a cluster to a geographically separate site. In some instances we use the old tried and true backup and restore.

    When a business unit presents the need to store data within SQL I present them with the options trying to give them an over view of the good and the bad of each option. Surprisingly most requests end up being that "Last nights backup is good enough for us."

    One other thing I see a lot of is implementing an HA plan and then never testing it. Testing an HA has to be done on a regular basis or you may and probably will end up with unrecoverable data during a disaster.

    Rick Romack

  • We have implemented a HA solution in a VM cloud. Total nine big boxes forms the VM (2 dedicated) for the SQL HA cluster. (Other VMs are runing 40+ servers)

    SQL Sever 2005 is installed on these dedicated VM with active-passive configuration. So effectively both the nodes are sharing the disks. We have also implemented a DR solution using Database mirroring to a different site to provide more HA (Our DR site link is bit slow now, so we have used High Perf - Async - Manual failover and expecting a high speed connectivity soon. And it will be changed to High Perf - Sync - Automatic failover with a witness server).

    We are expecting SQL should be up and running within few seconds of any failure (disk/network cards/CPU)

  • SQL 2005 Data mirroring (Synchronous) with Log shipping sounds to be a good HA solution, when compared to the clustering. Data mirroring is for HA and Log shipping is for DR (Disaster Recovery).

    While we test our application in clustering environment, we found that the failover took around 4 minutes of time. From the technical people perspective, this 4 minutes time is not acceptable and it is not HA and they added that the failover should not exceed 20 seconds of time. Also the clustering requires the shared disk. There are various types of shared disks like SAN, NAS available in the market, which are very expensive. Then we dropped the clustering solution and went for Data Mirroring (Synchronous) with Log shipping solution.

    In SQL 2005 Data Mirroring (Synchronous), the transaction gets committed both in the primary database server and in the secondary database server simultaneously. Hence on failure of the primary database server, our application will to connect to the secondary database server immediately, with out any manual intervention, since both primary database and secondary database will be in sync at any time. This failover happened around 20 seconds of time.

  • SQL 2005 Data mirroring (Synchronous) with Log shipping sounds to be a good HA solution, when compared to the clustering. Data mirroring is for HA and Log shipping is for DR (Disaster Recovery).

    While we test our application in clustering environment, we found that the failover took around 4 minutes of time. From the technical people perspective, this 4 minutes time is not acceptable and it is not HA and they added that the failover should not exceed 20 seconds of time. Also the clustering requires the shared disk. There are various types of shared disks like SAN, NAS available in the market, which are very expensive. Then we dropped the clustering solution and went for Data Mirroring (Synchronous) with Log shipping solution.

    In SQL 2005 Data Mirroring (Synchronous), the transaction gets committed both in the primary database server and in the secondary database server simultaneously. Hence on failure of the primary database server, our application will to connect to the secondary database server immediately, with out any manual intervention, since both primary database and secondary database will be in sync at any time. This failover happened around 20 seconds of time.

  • Has anyone had any experiencing combining SQL Server Clustering with Storage level HA (block level mirror with automated failover at the storage level). I work for a small iSCSI storage vendor and we are developing the ability to build HA from the storage level (I know this is nothing new! Others in the space are already doing it) however I am wondering if this an option that is considered viable in the SQL Server community? (assuming the price was right), combining Application clustering with a clustered iSCSI storage solution.

  • bob.boule (4/29/2009)


    Has anyone had any experiencing combining SQL Server Clustering with Storage level HA (block level mirror with automated failover at the storage level). I work for a small iSCSI storage vendor and we are developing the ability to build HA from the storage level (I know this is nothing new! Others in the space are already doing it) however I am wondering if this an option that is considered viable in the SQL Server community? (assuming the price was right), combining Application clustering with a clustered iSCSI storage solution.

    Nope, not with HA clusters. Plenty of storage-level replication for so-called "geocluster" implementations for DR, but these never supported automated failover. Block level mirroring was used to keep the enormous amount of data in sync.

    IMO, you'd need a service/resource that could participate in the cluster in order to support automated failover.

    All the successful storage-level HA that I've seen makes the mirroring/failover invisible to Windows and SQL Server.

Viewing 15 posts - 1 through 15 (of 26 total)

You must be logged in to reply to this topic. Login to reply