MSDTC Cluster issue

  • We're having issues with our cluster failing and the two things we find that prompt the failure are a clustered mailroom client and MSDTC. When the cluster fails over, these two services don't restart when they get bumped to the new server. So while someone tackles this from the mailroom client direction, I've been assigned to look at the MSDTC service.

    Has anyone here had MSDTC cluster issues before? Do you know what causes I should be looking for?

    The error logs aren't too helpful as they just state that the service has degraded, not why the service has degrade. Any advice on what to look for would be greatly appreciated.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • can you provide OS info ?

    Clustering is one of the things that has been "enhanced" vastly with the differen os versions

    Did you have a look in the cluster service log file ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Ah, I suppose that would help, wouldn't it? @=)

    Windows 2003, up to date on Service Packs (so far as we know, but this is administered by a corporate office).

    It's an active / active / active / passive cluster (4 nodes). MSDTC is installed on it's own Resource (MSDTC Group), has it's own disk and has the follow 4 items under the group: IP Address, Network Name, Volume Manager Disk Group, and Distributed Transaction Coordinator.

    EDIT: Microsoft SQL Server 2005 - 9.00.3233.00 (Intel X86) Mar 6 2008 22:09:47 Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

    The servers worked fine for about 3 years. This issue started about a month ago and isn't going away despite corporate's attempts to patch.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • chances are you we reading my previous reply at the time I edited it.

    Did you have a look in the cluster service log file ?

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • C:\WINDOWS\Cluster\cluster.log

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (3/29/2011)


    C:\WINDOWS\Cluster\cluster.log

    I was about to ask that question. My server guys just gave me a funny look when I asked them. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • The cluster log is too full to have any data from last night's failure. All I'm getting is today's "Looks Alive" comments.

    Is there someplace I can set up the cluster log to archive so I don't lose this data the next time it happens?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (3/29/2011)


    ALZDBA (3/29/2011)


    C:\WINDOWS\Cluster\cluster.log

    I was about to ask that question. My server guys just gave me a funny look when I asked them. @=)

    That's a prosperous observation :crazy:

    Other Q may be:

    - is this only to a single node or does it happen with all nodes ?

    - are there pending reboots for automatic updates ?

    I'll have to leave you for now, I'll see if I can hop in later this evening ...

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • ALZDBA (3/29/2011)


    Other Q may be:

    - is this only to a single node or does it happen with all nodes ?

    Sometimes it's just the node that MSDTC is attached to. Sometimes, it's all nodes. I'd say about 75-85% of the time it's all nodes and the rest of the time it's just the one node.

    ALZDBA (3/29/2011)


    - are there pending reboots for automatic updates ?

    We don't do automatic updates on our servers (disabled). Corporate schedules updates, pushes them out during our maintenance window, and will reboot the servers when and if needed during that time frame.

    ALZDBA (3/29/2011)


    I'll have to leave you for now, I'll see if I can hop in later this evening ...

    I understand. Thanks for what you've given me so far.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Further information (which just goes to prove that even informational messages in the logs can be helpful):

    Application Log Informational Message


    Event Type:Information

    Event Source:MSSQL$ACP1

    Event Category:(2)

    Event ID:8562

    Date:3/28/2011

    Time:8:51:33 PM

    User:N/A

    Computer:XXXXXXXXXXXXXX

    Description:

    The connection has been lost with Microsoft Distributed Transaction Coordinator (MS DTC). Recovery of any in-doubt distributed transactions involving Microsoft Distributed Transaction Coordinator (MS DTC) will begin once the connection is re-established. This is an informational message only. No user action is required.

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 72 21 00 00 0a 00 00 00 r!......

    0008: 14 00 00 00 41 00 30 00 ....A.0.

    0010: 31 00 38 00 35 00 44 00 1.8.5.D.

    0018: 42 00 53 00 30 00 30 00 B.S.0.0.

    0020: 30 00 31 00 56 00 31 00 0.1.V.1.

    0028: 5c 00 41 00 43 00 50 00 \.A.C.P.

    0030: 31 00 00 00 00 00 00 00 1.......

    and

    Application Log Warning (pre-error)


    Event Type:Warning

    Event Source:MSDTC Client

    Event Category:CM

    Event ID:4359

    Date:3/28/2011

    Time:8:52:33 PM

    User:N/A

    Computer:XXXXXXXXXXXXXX

    Description:

    MS DTC is unable to communicate with MS DTC on a remote system. MS DTC on the primary system established an RPC binding with MS DTC on the secondary system. However, the secondary system did not create the reverse RPC binding to the primary MS DTC system before the timeout period expired. Please ensure that there is network connectivity between the two systems. Error Specifics:d:t\com\complus\dtc\dtc\cm\src\iomgrsrv.cpp:1318, Pid: 8136

    No Callstack,

    CmdLine: D:\MSSQL9\ACP1\MSSQL.1\MSSQL\Binn\sqlservr.exe -sACP1

    For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

    Data:

    0000: 41 00 30 00 31 00 38 00 A.0.1.8.

    0008: 35 00 2d 00 44 00 54 00 5.-.D.T.

    0010: 43 00 30 00 30 00 30 00 C.0.0.0.

    0018: 31 00 2d 00 56 00 00 00 1.-.V...

    Now, we do have network connectivity between the nodes. Network DTC Access is checked and allows both inbound and outbound. And we haven't had a problem until recently (it's always worked fine before and no changes were made to the MSDTC setup that we know of).

    So, any other thoughts would be appreciated.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • there are known issues if the clustered DTC resource is not active on the same node as the clustered sql server instance. Windows 2008 clusters allow mulitple clustered DTC applications, one for each node where sql server is active

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • FYI a couple of MS refs that might be helpfull

    - http://support.microsoft.com/kb/899115

    - http://support.microsoft.com/kb/900216

    - http://support.microsoft.com/kb/919034

    - http://technet.microsoft.com/en-us/library/aa997579%28EXCHG.80%29.aspx

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Perry Whittle (3/29/2011)


    there are known issues if the clustered DTC resource is not active on the same node as the clustered sql server instance. Windows 2008 clusters allow mulitple clustered DTC applications, one for each node where sql server is active

    DTC is on the first active node with an instance. But just in case, what are these issues you are referring to so I can research them?

    I'm not on 2008, BTW. And I can't wait for an upgrade to solve the problem. This is production. I need to fix it two weeks ago.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Alzdba, Thanks for the links. I've seen a few of them before, but the one on tracing is new and I might just have to do that.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (3/29/2011)


    DTC is on the first active node with an instance. But just in case, what are these issues you are referring to so I can research them?

    This first link details the issues and support.

    Specifically this section

    Should MSDTC be a clustered resource?

    Microsoft only supports running MSDTC on cluster nodes as a clustered resource. We do not recommend or support running MSDTC in stand-alone mode on a cluster. Using MSDTC as a non-clustered resource on a Windows cluster is problematic. This configuration is problematic because transactions could be orphaned and you may experience data corruption if a cluster failover occurs.

    You may also find this useful and this too

    For the instance of SQL Server active on a node where no clustered MSDTC is running go into SSMS and under management you should see "Distributed Transaction Coordinator" has a red icon and not a green one against it, unless the service is running locally, but that brings about the issue mentioned above.

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply