Issue with second instance being moved to a different node

  • I have a new SQL 2019 failover cluster which consists of two nodes and two SQL instances. The first instance is using port 1433 and the second 1439.

    During failover testing I have been having issues with the second instance. It moves across to the new node and all cluster resources are brought online and are healthy. There are no events in the logs. I can connect to the instance on the server directly but remote connections fail.

    If I move it back to the original node it works fine again.

    Th first instance appears to be unaffected and works fine on both nodes.

    Does anyone have any suggestions on what I can check/try?

     

     

  • MattNorman88 wrote:

    I have a new SQL 2019 failover cluster which consists of two nodes and two SQL instances. The first instance is using port 1433 and the second 1439.

    During failover testing I have been having issues with the second instance. It moves across to the new node and all cluster resources are brought online and are healthy. There are no events in the logs. I can connect to the instance on the server directly but remote connections fail.

    If I move it back to the original node it works fine again.

    Th first instance appears to be unaffected and works fine on both nodes.

    Does anyone have any suggestions on what I can check/try?

    Have you configured the settings for both instances on both nodes when SQL is active?

    Michael L John
    If you assassinate a DBA, would you pull a trigger?
    To properly post on a forum:
    http://www.sqlservercentral.com/articles/61537/

  • As far as I am aware everything is configured on both nodes.

    There are some occasions where failing the instances over works fine and then randomly it doesn't work again.

    Occasionally I have noticed the network in the windows cluster manager changes to a 'Partitioned' state and I have to change the settings to not allow the cluster to use it and then change back to bring it back online.

  • I've looked in to this a bit further and it appears that is might be an issue with our layer 3 switches.

    A packet capture shows that the ARP announcements go out from the new node declaring that it now has the cluster IP address.

    Everything within the same subnet as the nodes can reach it but anything outside cannot.

    Inspecting the ARP tables on the layer 3 switch shows that it is sometimes holding on to the old nodes MAC address.

    It's just really strange that this only happens 50% of the time.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply