Yesterday I was in Redmond, teaching the UM portion of the Microsoft Certified Solutions Master: Messaging class. This was the first rotation for this particular class, which replaces the MCM Exchange 2010 class, so I had all new content and an eager group of 14 motivated, smart MCSM candidates, including fellow MVP Michael van Horenbeeck, several people who I knew from online interactions (hi, Hany and Jerrid!), plus candidates from Germany, Israel, the US, Australia, and probably a few other places.
The teaching session went well, although my slides have a few lingering rough spots that I’ll need to polish. In this rotation we had a brand-new, and much improved, lab environment, so when Michael called me over to have a look at something my first thought was that it was a lab setup issue.
He couldn’t get the UM Call Router service to run after he’d enabled a certificate for it and set its startup mode to “dual”. These steps are required if you want to integrate Exchange UM with Lync, but they must be done in a specific order: first you change the startup mode with Set-UMCallRouterService (which will complain that you can’t enable secure SIP without a certificate!) and then you use Enable-ExchangeCertificate to assign the certificate.After he did so, the UM Call Router service stopped answering requests. When he ran netstat, he saw that it was listening on the IPv4 and IPv6 loopback addresses, but not on the assigned IPs for that server. The call router service had logged event ID 1621, which said that the UMCR couldn’t start because “the Client Access service was disabled.” This didn’t make a bit of sense, so we started digging.
First, I verified that no one else was having this particular problem—and they weren’t, so it seemed to be localized to Michael’s environment. Next I spent some time researching event ID 1621 on the intertubes, but that didn’t take long; the only two mentions I found were on TechNet, and the suggested solution was to reinstall Exchange. Nope, not gonna happen.
Michael had the bright idea to check the service component availability.. and it came back as “inactive”. However, the service was still running and would respond to telnet requests on port 5060 on the loopback address. This seemed very odd.
We ued Set-ServerComponentState to force the UMCR back into normal state, and it started listening on 0.0.0.0 again! So clearly the problem was that Managed Availability had killed off the service—now we started investigating why.
After a number of experiments, our theory was that because the UMCR couldn’t start in dual mode without a certificate assigned, so Managed Availability decided that it was unhealthy and marked its state as “inactive.” To test this, we ran Set-ServerComponentState to put the UMCR in maintenance mode; sure enough, the next time the service was probed, it unbound from 0.0.0.0 but remained bound to both loopback addresses. Forcing the service state back to healthy caused it to rebind to 0.0.0.0.
This leads me to point out a couple of things:
- It strikes me as very odd that after Managed Availability marked the service as inactive that it kept running. I assume that this is on purpose; the service stays up so that Managed Availability can continue to probe it and keep its state updated.
- The description for event ID 1621 is so bad it isn’t even wrong—the service wasn’t running because it couldn’t start with an unassigned certificate (and in fact, there was a separate event indicating exactly that). The problem had nothing to do with the (non-existent) client access service being disabled.
- I didn’t see any events logged indicating that the component was in an unhealthy state, although I might have missed them. Once we’d fixed the binding problem, as we transitioned UMCR into and out of maintenance mode, we saw event ID 1648, indicating that the UMCR was returning to healthy state.
Clearly I still have a lot to learn about Managed Availability! I recommend starting with this blog post, which explains in more depth how you can find out what sort of mischief it may have committed on your unsuspecting services…