Monday, February 14, 2011

Windows 2008R2 MSDTC Errors

So, having moved away from using IBM/Lombardi Teamworks as a 3rd party workflow platform, our client decided to go with K2 blackpearl, mostly because of the close Microsoft .NET/SharePoint integration. As with any new 3rd party software toolkit, we ran into a number of strange deployment issues. Here is one of the big ones that had us stumped for a few weeks - MSDTC generating errors when creating a K2 "SmartObject" instance on the K2 server in a distributed farm environment (all Windows 2008R2 servers).

The MSDTC transaction manager was unable to pull the transaction from the source transaction manager due to communication problems. Possible causes are: a firewall is present and it doesn't have an exception for the MSDTC process, the two machines cannot find each other by their NetBIOS names, or the support for network transactions is not enabled for one of the two transaction managers.

As it turns out, K2 SmartObjects rely on the MSDTC service to do their data abstraction magic. I had no idea. As a matter of fact, I had never even heard of MSDTC until this error cropped up. It seems MSDTC is actually the Distributed Transaction Coordinator (which shows up in the services applet in windows 2008R2), which I understand to be a SQL server specific service for performing reliable "application transactions" across multiple windows servers to a SQL server instance.

Anyway, we tried changing DTC security settings on all the servers first (Component Services-Computers-My Computer-Distributed Transaction Coordinator-Local DTC-Properties-Security tab) but no luck there. After a lot of trial and error, the support guys at K2 had me completely uninstall the MSDTC service, remove all its registry entries (eek!), reboot, then add all the services back in ... and after doing that on all the servers in my farm, it solved the problem.

Why did this work? Well, one thing that leaps to mind is the error we were seeing using the DTCPing tool from microsoft:

WARNING:the CID values for both test machines are the same while this problem won't stop DTCping test, MSDTC will fail for this

The recommended approach to fix this is to uninstall the MSDTC service on one of the servers involved in the DTCPing test. Then I realized the MSDTC service is installed by default in Windows Server 2008R2 ... and all of our servers are VMWare instances cloned from a single Windows Server 2008R2 image. So my best guess is if you are using cloned VM images of Windows Server 2008R2, you could also run into this problem.

Anyway, to "reinstall" MSDTC, I went though the following process which worked on my farm (though, this may be overkill):

On each server, do the following:
  1. From Server Manager, Click "Roles", scroll down to the "Application Server" section, click "Remove Role Services" and uncheck/remove "Distributed Transactions"
  2. From powershell in "Run as administrator" mode, issue the MSDTC uninstall command "msdtc -uninstall" (
  3. Using regedt32.exe, delete all the MSDTC registry entries (My Computer\HKEY_LOCAL_MACHINE\Software\Microsoft\MSDTC)
  4. Restart the server.
  5. Reinstall the "Distributed Transactions" role service from step #1. In addition, I also made sure to install the MSMQ role/feature, and I enabled TCP Port sharing, all the "Windows Process Activation Service Support" features (I dont know if all of that is necessary)
  6. Run the MSDTC install command from a powershell window as administrator "msdtc -install"
  7. Reset all the DTS security settings (Component Services-Computers-My Computer-Distributed Transaction Coordinator-Local DTC-Properties-Security tab), and also update the "Distributed Transaction Coordinator" service to start automatically on boot
  8. Reboot

1 comment:

gsvi said...

thank you thank you

this was a life saver