Moving clusters to new hardware - best plan?

  • I'm trying to come up with a good migration strategy that will minimize downtime for the following scenario:

    Right now we have

    SQLC cluster, Active/Passive on SQLC1, SQLC2, drives D: and L:, ~1TB of data

    SQLD cluster, Active/Passive on SQLD1, SQLD2, driver D: and L: ~0.75TB of data

    New hardware has been set up, 3 nodes - SQLNEW1, SQLNEW2, SQLNEW3. The intent (to save hardware costs) is to go Active/Active/Passive. All servers are on the same SAN.

    I've got some major issues:

    1) I need to move at least one cluster's data to new drive letters to resolve the D:/L: conflict of co-existing on the same cluster

    2) They are both default instances, and will now need to become named instances.

    3) Both are replication publishers of a lot of their data, and it would take a LONG time to reinitialize and/or recreate the publications.

    So here's the high level plan:

    1) I have to change the drive letters of SQLC to E: and M: without confusing MSSQL and breaking replication (!)

    2) Install two MSSQL named instances into two cluster groups on SQLNEW; SQLNEWC using drives E:, M: and SQLNEWD using D:, L:

    3a) Take all cluster groups offline, remove SQL drive dependencies, remove drives from the cluster groups

    3b) unpresent D:, L:, E:, M: on the SAN from SQLNEW (blank dbs)

    3c) unpresent D:, L: from SQLC and present to SQLNEW

    3d) add E:, M: to the SQLNEWC cluster group, add SQL dependencies

    3e) add D:, L: to the SQLNEWD cluster group, add SQL dependencies

    3f) bring both groups online and pray?

    Here's the gotcha's I've identified:

    1) Moving MSSQL data and log disks to new drive letters inside a cluster group without disturbing replication - something tells me that detach/reattach will blow up replication but good

    2) starting MSSQL with a master db that was created on a server with a different name will probably confuse MSSQL at startup

    3) drive signatures will probably be a factor - the cluster groups may not want to start after swapping out the underlying disks, but I think the cluster recovery tool will help here

    4) even with network aliases in place to "fool" replication into communicating with the correct new mssql servers, since they've gone from default instances to named the publications will probably break anyway.

    I'm pretty sure its a foregone conclusion that I'll have to re-do replication entirely and our reporting servers will be broken for a few days, which is tolerable, but I really have to come up with a way to get to the new hardware with minimal downtime for SQLC/SQLD services.

    Does anyone have any thoughts, other than "That DBA is a lunatic"?

  • One thing that you're forgetting: on a cluster, you have a virtual sql name for each instance. So, I don't think that you have to worry about the named instance issue - with replication, or with application connection strings.

    Before taking the databases offline, look into using ALTER DATABASE to change the location of the data/log files. Then, after having been moved to the new locations, they should come up okay.

    I think that you could keep the same virtual sql name for each instance, and then you would be okay with restoring the master databases.

    I strongly recommend testing all of this out before you try it on your production system.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Is there a way to use ALTER DATABASE to simply change a file's location? My understanding is that you'd have to add a new file, migrate the data to the new file, then drop the original file. I've had issues with that in the past because dropping the original file would fail, even if there was no data in it, but I was never able to figure out why.

    I can see how reusing the virtual cluster names would work for addressing, but one problem is that you can only have one default instance per cluster - the rest have to be named instances with a port other than 1433. I suppose I could set it up as SQLC on 1433 and put SQLD\SOMEINSTANCENAME on port 3000, but it would then be necessary to set up the replication subscribers with SQL Native Client aliases to set UPSD as port 3000 (thus eliminating the reference to the instance name). I have no idea if that would work properly...

    And yeah, you bet I'm going to dry run this on some VMs first!

  • Brian Brennan (10/26/2010)


    Is there a way to use ALTER DATABASE to simply change a file's location?

    Yes. Use the MODIFY FILE option here

    I can see how reusing the virtual cluster names would work for addressing, but one problem is that you can only have one default instance per cluster - the rest have to be named instances with a port other than 1433.

    Not on a cluster. Each instance has its own virtual computer name, and IP address. It knows, based upon the IP address, which instance to send it to. Yes, they are installed as instances (and IMO this should be a standard installation practice for every instance!).

    And yeah, you bet I'm going to dry run this on some VMs first!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • we're looking to replace 2 nodes in an active/passive cluster and so far the plan is to add a new node and then evict an existing node. then do it again with another node. or add 2 new nodes and evict the existing 2 nodes.

    windows 2003 and higher clusters can support more than 2 nodes

  • Yeah I thought of that but the new servers are running Windows Server 2008 R2 - the old ones are on Windows Server 2003.

  • shouldn't matter since MS has a process to migrate/upgrade the OS on clusters

  • This strikes me as an exceptionally complex and risky evolution, and as such I must STRONGLY recommend you not rely on a forum to assist you. Engage a QUALIFIED consultant (and there aren't many of those out there for this type of scenario) to assist in the planning and execution or you will be setting yourself up for some serious pain.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • we don't have a firm upgrade plan yet, but once we decide to buy the hardware i'm going to test on VMWare. ESXi is free and all you need is a server and some storage. very easy to create clusters with shared storage on vmware.

  • Brian B. (10/26/2010)


    I'm trying to come up with a good migration strategy that will minimize downtime for the following scenario:

    Right now we have

    SQLC cluster, Active/Passive on SQLC1, SQLC2, drives D: and L:, ~1TB of data

    SQLD cluster, Active/Passive on SQLD1, SQLD2, driver D: and L: ~0.75TB of data

    Reading below, I'm deducing that you have one clustered default instance in each of the 2 node clusters and both contain Replication publications, is that correct?

    How large are the replications?

    Brian B. (10/26/2010)


    The intent (to save hardware costs) is to go Active/Active/Passive.

    This is a misconception, a cluster is either Active\Passive or Active\Active!

    Brian B. (10/26/2010)


    I've got some major issues:

    You can say that again! Merging 2 separate clusters will be nothing short of a small horror story and could potentially have an horrific ending too 😉

    Brian B. (10/26/2010)


    2) They are both default instances, and will now need to become named instances.

    Changing an instance name is possible but wouldn't be my personal preference!

    You could build the 3 new nodes into one of the existing clusters of your choice and extend the default clustered instance across the new nodes. Remove the 2 old nodes once the new nodes have been introduced. For the second cluster deploy a new named instance on the new cluster and migrate databases into the new installed instance. When successful destroy the remaining old cluster

    Brian B. (10/26/2010)


    So here's the high level plan:

    1) I have to change the drive letters of SQLC to E: and M: without confusing MSSQL and breaking replication (!)

    2) Install two MSSQL named instances into two cluster groups on SQLNEW; SQLNEWC using drives E:, M: and SQLNEWD using D:, L:

    3a) Take all cluster groups offline, remove SQL drive dependencies, remove drives from the cluster groups

    3b) unpresent D:, L:, E:, M: on the SAN from SQLNEW (blank dbs)

    3c) unpresent D:, L: from SQLC and present to SQLNEW

    3d) add E:, M: to the SQLNEWC cluster group, add SQL dependencies

    3e) add D:, L: to the SQLNEWD cluster group, add SQL dependencies

    3f) bring both groups online and pray?

    Here's the gotcha's I've identified:

    1) Moving MSSQL data and log disks to new drive letters inside a cluster group without disturbing replication - something tells me that detach/reattach will blow up replication but good

    2) starting MSSQL with a master db that was created on a server with a different name will probably confuse MSSQL at startup

    3) drive signatures will probably be a factor - the cluster groups may not want to start after swapping out the underlying disks, but I think the cluster recovery tool will help here

    4) even with network aliases in place to "fool" replication into communicating with the correct new mssql servers, since they've gone from default instances to named the publications will probably break anyway.

    I'm pretty sure its a foregone conclusion that I'll have to re-do replication entirely and our reporting servers will be broken for a few days, which is tolerable, but I really have to come up with a way to get to the new hardware with minimal downtime for SQLC/SQLD services.

    Does anyone have any thoughts, other than "That DBA is a lunatic"?

    make sure you have a good backout plan too!!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (10/28/2010)


    Reading below, I'm deducing that you have one clustered default instance in each of the 2 node clusters and both contain Replication publications, is that correct?

    How large are the replications?

    We have about 15 publications, totalling ~0.75TB. Reinitializing would mean at least a day of downtime on our reporting/analytics servers.

    Perry Whittle (10/28/2010)


    Brian B. (10/26/2010)


    The intent (to save hardware costs) is to go Active/Active/Passive.

    This is a misconception, a cluster is either Active\Passive or Active\Active!

    How is it a misconception? The target is a 3 node cluster - at any time two will be Active, and one will be Passive, thus Active/Active/Passive. If it were a 6 node cluster, it could easily be described as Active/Active/Active/Active/Passive/Passive.

    Perry Whittle (10/28/2010)


    You could build the 3 new nodes into one of the existing clusters of your choice and extend the default clustered instance across the new nodes. Remove the 2 old nodes once the new nodes have been introduced. For the second cluster deploy a new named instance on the new cluster and migrate databases into the new installed instance. When successful destroy the remaining old cluster

    That's a much safer plan, but downtime is critical to the business... even an hour down would cost tens of thousands of dollars.

    Perry Whittle (10/28/2010)


    make sure you have a good backout plan too!!

    Absolutely. If anything goes wrong, we can always undo the changes and be back to our original configuration in a heartbeat.

    I've done a very similar (though decidedly less complex) operation successfully in the past with minimal downtime. The concerns I had listed originally were pretty much addressed earlier in the thread or researched/tested since I posted. I've already tested everything on the new hardware with db copies of production successfully, including drive letter changes, unpresent/represent drives, and migration from a default to named instance. The shock (to me) is that I even managed to do it without breaking replication! It turns out that setting up SQL Native Client aliases on the participating servers is all you need to do to allow existing publications to find the new publisher under the original network name.

    When the migration is done, I'll post my deployment checklists in this thread just in case it proves useful to others down the road.

  • Brian B. (10/26/2010)


    How is it a misconception? The target is a 3 node cluster - at any time two will be Active, and one will be Passive, thus Active/Active/Passive. If it were a 6 node cluster, it could easily be described as Active/Active/Active/Active/Passive/Passive.

    Active/Active/Active/Active/Passive/Passive that's a new one 🙂

    As I said previously, a cluster is either active\passive or active\active. This link from MS provides an overview of the 2 cluster operating modes and the terminologies!

    Brian B. (10/26/2010)


    That's a much safer plan, but downtime is critical to the business... even an hour down would cost tens of thousands of dollars.

    That's just it, there shouldnt be any downtime. The new instance is deployed and databases are migrated to the new instance. The time it takes to change the application connection details would be the only issue.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Brian B. (10/26/2010)


    Yeah I thought of that but the new servers are running Windows Server 2008 R2 - the old ones are on Windows Server 2003.

    ran into that same issue. don't believe you can add the W2K8R2 machines to a W2K3 one.

  • As I said previously, a cluster is either active\passive or active\active. This link from MS provides an overview of the 2 cluster operating modes and the terminologies!

    From the link you provided:

    "The Operations and Technology Group (OTG) has configured a multinode cluster that consists of four active nodes, one primary passive node, and two alternate passive nodes."

    Mike Hinds Lead Database Administrator1st Source BankMCP, MCTS

  • Mike Hinds (10/29/2010)


    a multinode cluster that consists of four active nodes, one primary passive node, and two alternate passive nodes."

    Yes, that is an Active\Passive cluster!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply