Tag Archives: Exchange 2007

Microsoft Exchange engineering and cloud-scale

The Exchange team (or at least Perry Clarke, its fearless leader) has been known to describe Exchange Online as “the gateway drug to the cloud.” But how did that come to pass?

This week at Ignite, I was lucky enough to have dinner with some folks from the Exchange product team and a very, very large customer where we discussed the various ways in which Exchange engineering has blazed a trail the rest of Microsoft’s server products have eventually followed. After a bracing Twitter discussion this afternoon with @swiftonsecurity and some of her other followers, I thought it would be fun to put together a partial list of some of the things we discussed to illustrate how the Exchange team has built a stairway to heaven, or an elevator to the cloud, or something like that.

Let’s start with PowerShell. Love it or hate it, it is here, so we all have to deal with it. In 2007, the idea that Exchange would be built on PS was both revolutionary and, to many, revolting, but it allowed Microsoft to do several important things (not all of which shipped in Exchange 2007, but all of which are critical to cloud operations):

  • Greatly improve testability, both for the developers themselves but also for administrators, who now got a suite of protocol and endpoint-related tests they could run as part of troubleshooting– critically important when you have to troubleshoot in a global network of data centers hosting tens of millions of mailboxes
  • Fully enable role-based access control, also critical for cloud deployments where customers want to control who can do what with their data
  • Finally decouple the presentation layer of the UI (EMC, EAC, etc) from business logic
  • Massively improve the tools for scripting, including enabling very large-scale bulk operations– an obvious requirement for a cloud-scale service

Requiring PowerShell was a bold move by the Exchange team but one which has both paid off hugely and one that’s been echoed by the Windows, SharePoint, SQL Server and Skype teams, all of whom depend on it for managing their own cloud services. (See also: the Microsoft Graph APIs.)

Then there’s storage performance. In ancient days, getting scale from Exchange pretty much required the use of SANs due to Exchange’s IO requirements. Now, thanks to the IOPS diet imposed by Exchange engineering, it doesn’t. Tony does his usual excellent job of summarizing the actual reductions. Summary: Exchange 2016 requires roughly 96% fewer IOPS than Exchange 2003 did. There have been a ton of storage performance improvements in Exchange’s sister products (notably SQL) but those have their own stories that I’m not competent to tell. The relentless drive to cut IOPS requirements was one of the biggest enablers for Exchange Online, since controlling storage provisioning costs is critical for any type of scaled cloud service.

Of course, data protection is critical too. Exchange moved from having a single monolithic database to one with separate property and MIME databases (Exchange 2000) then to having software-based database replication with clustering (Exchange 2007) to shared-nothing, fully-replicated active/passive database replication (Exchange 2010 and later). Keeping multiple separate database copies (including lagged copies) enables all sorts of DR and HA scenarios that previously had required SANs. The ability to reliably use cheap JBOD disks, which thanks to Moore’s Law have embiggened nicely during Exchange’s lifetime, has been a key enabler for Exchange Online.

Then there’s a bunch of other architectural changes and improvements that are really only interesting to Exchange nerds. For the latest example, I present “read from passive,” but there’s also all the stuff covered by the Preferred Architecture.

Oh, I almost forgot: managed availability gives ExO a fair degree of self-healing, although its behavior sometimes surprises on-prem admins who see it do things on their behalf unexpectedly.

Oh, and let’s not forget the conversion of all the Exchange codebase to managed code– that was an important accelerator for the move to the cloud, as well as serving as a lighthouse for other product groups with code of similar vintage.

There are more examples, I’m sure, but these should get the point across– there’s been a steady stream of architectural changes in the nearly 20 years since Exchange 4.0 shipped that have led directly to the capability, power, and reliability of Exchange Online– which really has been the gateway drug for getting Microsoft’s customers to Office 365.

 

 

Leave a comment

Filed under UC&C

Do mailbox quotas matter to Outlook and OWA?

Great question from my main homie Brian Hill:

Is there a backend DB reason for setting quotas at a certain size? I have found several links (like this one) discussing the need to set quotas due to the way the Outlook client handles large numbers of messages or OST files, but for someone who uses OWA, does any of this apply?

Short answer: no.

Somewhat longer answer: no.

The quota mechanism in Exchange is an outgrowth of those dark times when a large Exchange server might host a couple hundred users on an 8GB disk drive. Because storage was so expensive, Microsoft’s customers demanded a way to clamp down on mailbox size, so we got the trinity of quota limits: prohibit send, prohibit send and receive, and warn. These have been with us for a while and persist, essentially unchanged, in Exchange 2013, although it is now common to see quotas of 5GB or more on a single mailbox.

Outlook has never had a formal quota mechanism of its own, apart from the former limit of 2GB on PST files imposed by the 32-bit offsets used as pointers in the original PST file format. This limit was enforced in part by a dialog that would tell you that your PST file was full and in part by bugs in various versions of Outlook that would occasionally corrupt your PST file as it approached the 2GB size limit. Outlook 2007 and later pretty much extinguished those bugs, and the Unicode PST file format doesn’t have the 2GB limit any longer. Outlook 2010 and 2013 set a soft limit on Unicode PSTs of 50GB, but you can increase the limit if you need to.

Outlook’s performance is driven not by the size of the PST file itself (thought experiment: imagine a PST with a single 10GB item in it as opposed to one with 1 million 100KB messages) but by the number of items in any given folder. Microsoft has long recommended that you keep Outlook item counts to a maximum of around 5,000 items per folder (see KB 905803 for one example of this guidance). However, Outlook 2010 and 2013, when used with Exchange 2010 or 2013, can handle substantially more items without performance degradation: the Exchange 2010 documentation says 100,000 items per folder is acceptable, though there’s no published guidance for Exchange 2013. There’s still no hard limit, though. The reasons why the number of items (and the number of associated stored views) are well enumerated in this 2009 article covering Exchange 2007. Some of the mechanics described in that article have changed in later versions of Exchange but the basic truth remains: the more views you have, and/or the more items that are found or selected by those views, the longer it will take Exchange to process them.

If you’re wondering whether your users’ complaints of poor Outlook performance are related to high item counts, one way to find out is to use a script like this to look for folders with high item counts.

Circling back to the original question: there is a performance impact with high item count folders in OWA, but there’s no quota mechanism for dealing with it. If you have a user who reports persistently poor OWA performance on particular folders, high item counts are one possible culprit worth investigating. Of course, if OWA performance is poor across multiple folders that don’t have lots of items, or across multiple users, you might want to seek other causes.

Leave a comment

Filed under UC&C

Microsoft Certified Systems Master certification now dead

I received a very unwelcome e-mail late last night:

Microsoft will no longer offer Masters and Architect level training rotations and will be retiring the Masters level certification exams as of October 1, 2013. The IT industry is changing rapidly and we will continue to evaluate the certification and training needs of the industry to determine if there’s a different certification needed for the pinnacle of our program.

This is terrible news, both for the community of existing MCM/MCSM holders but also for the broader Exchange community. It is a clear sign of how Microsoft values the skills of on-premises administrators of all its products (because all the MCSM certifications are going away, not just the one for Exchange). If all your messaging, directory, communications, and database services come from the cloud (or so I imagine the thinking goes), you don’t need to spend money on advanced certifications for your administrators who work on those technologies.

This is also an unfair punishment for candidates who attended the training rotation but have yet to take the exam, or those who were signed up for the already-scheduled upgrade rotations, and those who were signed up for future rotations. Now they’re stuck unless they can take, and pass, the certification exams before October 1… which is pretty much impossible. It greatly devalues the certification, of course, for those who already have it. Employers and potential clients can look at “MCM” on a resume and form their own value judgement about its worth given that Microsoft has dropped it. I’m not quite ready to consign MCM status to the same pile as CNE, but it’s pretty close.

The manner of the announcement was exceptionally poor in my opinion, too: a mass e-mail sent out just after midnight Central time last night. Who announces news late on Friday nights? People who are trying to minimize it, that’s who. Predictably, and with justification, the MCM community lists are blowing up with angry reaction, but, completely unsurprisingly, no one from Microsoft is taking part, or defending their position, in these discussions.

As a longtime MCM/MCSM instructor, I have seen firsthand the incredible growth and learning that takes place during the MCM rotations. Perhaps more importantly, the community of architects, support experts, and engineers who earned the MCM has been a terrific resource for learning and sharing throughout their respective product spaces; MCMs have been an extremely valuable connection between the real world of large-scale enterprise deployments and the product group.

In my opinion, this move is a poorly-advised and ill-timed slap in the face from Microsoft, and I believe it will work to their detriment.

18 Comments

Filed under FAIL, UC&C

MEC 2014: Austin, 31 March-2 April 2014

This is pretty darn exciting: Microsoft has announced the official date and time of the Microsoft Exchange Conference (MEC) in 2014. It will be held in Austin, home of at least one of the original MECs (the first one, maybe? I wasn’t there so I’m not sure) from 31 March to 2 April 2014. 

I am sure that nothing bad will come of Microsoft’s decision to include April Fool’s Day as part of the conference. Nope, not at all.

On a personal note, I am excited that the conference will be in Austin. It’s one of my favorite cities, and I’ll be making side trips to see family (Hi, Lee Anne!) and friends while there. I also believe that we should have an Exchange-themed visit to the Salt Lick BBQ. Stay tuned for details!

Leave a comment

Filed under UC&C

Loading PowerShell snap-ins from a script

So I wanted to launch an Exchange Management Shell (EMS) script to do some stuff for a project at work. Normally this would be straightforward, but because of the way our virtualized lab environment works, it took me some fiddling to get it working.

What I needed to do was something like this:

c:\windows\system32\powershell\v1.0\powershell.exe -command "someStuff"

That worked fine as long as all I wanted to do was run basic PowerShell cmdlets. Once I started trying to run EMS cmdlets, things got considerably more complex because I needed a full EMS environment. First I had to deal with the fact that EMS, when it starts, tries to perform a CRL check. On a non-Internet-connected system, it will take 5 minutes or so to time out. I had completely forgotten this, so I spent some time fooling around with various combinations of RAM and virtual CPUs trying to figure out what the holdup was. Luckily Jeff Guillet set me straight when he pointed me to this article, helpfully titled “Configuring Exchange Servers Without Internet Access.” That cut the startup time waaaaay down.

However, I was still having a problem: my scripts wouldn’t run. They were complaining that “No snap-ins have been registered for Windows PowerShell version 2”. What the heck? Off to Bing I went, whereupon I found that most of the people reporting similar problems were trying to launch PowerShell.exe and load snap-ins from web-based applications. That puzzled me, so I did some more digging. Running my script from the PowerShell session that appears when you click the icon in the quick launch bar seemed to work OK. Directly running the executable by its path (i.e. %windir%\system32\powershell\v1.0\powershell.exe) worked OK too… but it didn’t work when I did the same thing from my script launcher.

Back to Bing I went. On about the fifth page of results, I found this gem at StackExchange. The first answer got me pointed in the right direction. I had completely forgotten about file system virtualization, the Windows security feature that, as a side effect, helps erase the distinction between x64 and x86 binaries by automatically loading the proper executable even when you supply the “wrong” path. In my case, I wanted the x64 version of PowerShell, but that’s not always what I was getting because my script launcher is a 32-bit x86 process. When it launched PowerShell.exe from any path, I was getting the x86 version, which can’t load x64 snap-ins and thus couldn’t run EMS.

The solution? All I had to do was read a bit further down in the StackExchange article to see this MSDN article on developing applications for SharePoint Foundation, which points out that you must use %windir%\sysnative as the path when running PowerShell scripts after a Visual Studio build. Why? Because Visual Studio is a 32-bit application, but the SharePoint snap-in is x64 and must be run from an x64 PowerShell session… just like Exchange.

Armed with that knowledge, I modified my scripts to run PowerShell using sysnative vice the “real” path and poof! Problem solved. (Thanks also to Michael B. Smith for some bonus assistance.)

1 Comment

Filed under General Tech Stuff, UC&C

Excessive transaction log growth with iOS 6.1 devices

Well, it appears that Apple has done it again: reports are starting to surface of runaway transaction log growth when mobile devices running iOS 6.1 synchronize with Exchange Server. Tony has a good synopsis here.

Those of you who have been administering Exchange for a while may think this sounds familiar– that’s because there was a very similar problem with Microsoft Entourage back in the day, as detailed by Jeremy Kelly here. Remarkably, a couple of years later, we got the same bug in a slightly different guise, as described in KB 935848. In both cases, the problem was that the client was too stupid to detect certain types of failures, so it would keep retrying the failed operation, which would keep failing. This endless loop quickly resulted in large volumes of transaction log files on the Exchange server.

Luckily, Exchange 2010 and 2013 include throttling to prevent misbehaving clients from using up an excessive share of resources. However, the throttling controls available regulate EAS based on the amount of time user requests take, the number of concurrent connections, or the number of device partnerships. None of these parameters are useful in preventing the iOS 6.1-related problem; it’s not that the individual requests take up an excessive amount of time, it’s that there are so many requests that they generate an excessive log volume. (This video may provide a useful explanation for the phenomenon.)

Exchange 2013 includes the ability to specifically block misbehaving Exchange ActiveSync devices based on “suspicious” behavior. I will have a lot more to say about that in the near future, although that spiffy feature doesn’t help anyone now suffering the problem. For now, all we can do is the following:

  • Block iOS 6.1 devices using an Exchange ActiveSync device access rule
  • Discourage your users from upgrading, although I expect this to be an ineffective strategy
  • If you have a support relationship with Apple, report this problem to them. If you’re a developer, file a RADAR issue. If you have enterprise technical support with Apple, use it. I’ve seen reports that the ordinary consumer-level technical support (i.e. the $49 pay-per-incident support, as well as AppleCare) doesn’t have any way to report this particular problem in an actionable way.

Thoughts for another time: the rapid adoption rate of iOS devices has many benefits for users, including largely avoiding the fragmentation problems that plague Android with issues (like this “smishing” fix that virtually no one has). However, when Apple ships a buggy update, which is common, that rapid adoption multiplies the pain of the bug.

Update 1535 CST 8 Feb: Ina Fried at AllThingsD is reporting that Vodafone is telling iPhone 4S users not to upgrade to iOS 6.1.

1 Comment

Filed under UC&C

Announcing Exchange 2013 Inside Out

Big news, at least to me!

Tony Redmond and I are delighted to announce a new joint project: Exchange 2013 Inside Out, a two-volume set that we will write for Microsoft Press, with an anticipated publication date in fall 2013. Tony is writing part 1, which covers the mailbox server role, the store, DAG, compliance, modern public folders and site mailboxes. I’m writing part 2, which covers client access, connectivity, transport, unified messaging, and Office 365 integration. This division looks as if I got more work to do, but Tony assures everyone that he can easily fill a book on just one topic.

Why two books where Exchange 2010 Inside Out merited just one? Well, just look at that book and reflect that it contains some 400,000 words in a 2-pound tome. Apart from the weight, it takes a long time to write such a book and there are tons of changes and new material in Exchange 2013 that we want to cover. The option of writing a single 500,000 word volume was just not attractive. Thankfully Microsoft Press agreed with us.

We’ve deliberately decided to take our time writing. There’s no point in rushing out a book based on a product immediately after it is released because no real-world experience exists. Microsoft runs an excellent Technology Adoption Program (TAP) that helps the development group understand how new versions of Exchange behave in production environments through early deployments, but we prefer to see how the software evolves and behaves as it is deployed more widely. This can’t really happen until after Microsoft releases Exchange 2010 SP3 and whatever update is necessary for Exchange 2007 SP3 to allow coexistence with Exchange 2013. Writing based on a firm foundation of real-world deployment experience has always seemed to make a lot of sense to us and we see no reason to change now.

Although the two volumes of Exchange 2013 Inside Out will stand alone, we will absolutely make sure that each volume complements the other. We will be technical editors for each other’s volumes, giving us equal opportunity to insert bad jokes and Exchange war stories across the breadth of both volumes.

Mostly because we have no firm dates in mind, we’re not releasing any details of our schedule, we hope that we will be able to offer an early-access program to readers through the Microsoft Press prePress program, so stay tuned!

Leave a comment

Filed under UC&C

Man-in-the-middle attacks against Exchange ActiveSync

I love the BlackHat security conference, although it’s been a long-distance relationship, as I’ve never been. The constant flow of innovative attacks (and defenses!) is fascinating, but relatively few of the attacks focus on things that I know enough about to have a really informed opinion. At this year’s BlackHat, though, security researcher Peter Hannay presented a paper on a potential vulnerability in Exchange ActiveSync that can result in malicious remote wipe operations. (Hannay’s paper is here, and the accompanying presentation is here.)

In a nutshell, Hannay’s attack depends on the ability of an attacker to impersonate a legitimate Exchange server, then send the device a remote wipe command, which the device will then obey. The attack depends on the behavior of the EAS protocol provisioning mechanism, as described in MS-ASPROV.

Before discussing this in more detail, it’s important to point out three things. First, this attack doesn’t provide a way to retrieve or modify data on the device (apart from erasing it, which of course counts as “modifying” it in the strictest sense.) Second, the attack depends on use of a self-signed certificate. Self-signed certificates are installed and used by Exchange 2007, 2010, and 2013 by default, but Microsoft doesn’t recommend their use for mobile device sync (see the 2nd paragraph here); contrary to Hannay’s claim in the paper, my experience has been that relatively few Exchange sites depend on self-signed certs.

The third thing I want to highlight: this is an interesting result and I’m sure that the EAS team is studying it closely to ensure that the future attacks Hannay contemplates, like stealing data off the device, are rendered impossible. There’s no current cause for worry.

The basis of this attack is that EAS provides a policy update mechanism that allows the server to push an updated security policy to the device when the policy changes. There are 3 cases when the EAS Provision command can be issued by the server:

  • when the client contacts the server for the first time. In this case, the client should pull the policy and apply it. (I vaguely remember that iOS devices prompt the user to accept the policy, but Windows Phone devices don’t.)
  • when the policy changes on the server, in which case the server returns a response indicating that the client needs to issue another Provision command to get the update.
  • when the server tells the device to perform a remote wipe.

The client sends a policy key with each command it sends to the server, so the server always knows what version of the policy the device has; that’s how it knows when to send back the response indicating that the device should reprovision.

If the client doesn’t have a policy, or if the policy has changed on the server, the client policy key won’t match the current server policy key, so the server sends back a response indicating that the client must reprovision before the server will talk to it.

There seems to be a flaw in Hannay’s paper, though.

The mechanism he describes in the paper is that used by EAS 12.0 and 12.1, as shipped in Exchange 2007. In that version of EAS, the server returns a custom HTTP error, 449, to tell the device to get a new policy. A man-in-the-middle attack in this configuration is simple: set up a rogue server that pretends to be the victim’s Exchange server, using a self-signed certificate, then when any EAS device attempts to connect, send back HTTP 449. The client will then request reprovisioning, at which time the MITM device sends back a remote wipe command.

Newer versions of Exchange return an error code in the EAS message itself; the device, upon seeing this code, will attempt to reprovision. (The list of possible error codes is in the section “When should a client provision?” in this excellent MSDN article). I think this behavior would be harder to spoof, since the error code is returned as part of an existing EAS conversation.

In addition, there’s the whole question of version negotiation. I haven’t tested it, but I assume that most EAS devices are happy to use EAS 12.1. I don’t know of any clients that allow you to specify that you only want to use a particular version of EAS. It’s also not clear to me what would happen if you send a device using EAS 14.x (and thus expecting to see the policy status element) the HTTP 449 error.

Having said all that, this is still a pretty interesting result. It points to the need for better certificate-management behavior on the devices, since Hannay points out that Android and iOS devices behaved poorly in his tests. Windows Phone seems to do a better job of handling unexpected certificate changes, although it’s also the hardest of the 3 platforms to deal with from a perspective of installing and managing legitimate certificates.

More broadly, Hannay’s result points out a fundamental flaw in the way all of these devices interact with EAS, one that I’ve mentioned before: the granularity of data storage on these devices is poor. A remote-wipe request from a single Exchange account on the device arguably shouldn’t wipe out data that didn’t come from that server. The current state of client implementations is that they erase the entire device– apps, data, and all– upon receiving a remote wipe command. This is probably what you want if your device is lost or stolen (i.e. you don’t want the thief to be able to access your personal or company data), but when you leave a company you probably don’t want them wiping your entire device. This is an area where I hope for, and expect, improvement on the part of EAS client implementers.

1 Comment

Filed under Security, UC&C

Stalking the wily ADAccess event 2112

Timing is everything.

A week ago, I got a late-night phone call about a problem with an Exchange server that seemed to be related to an expired certificate; the admin had replaced the expired cert on one member of a two-node DAG, but not the other. He noticed the errors in the event log when troubleshooting a seemingly unrelated problem, installed the new cert, and then boom! Bad stuff started happening.  Problem was, the reported problem was that inbound SMTP from a hosted filtering service that doesn’t use TLS wasn’t flowing, so it didn’t seem likely that certificate expiration would be involved. By the time he called me, he had installed the new certificate and rebooted the affected server, and all seemed to be well.

Fast forward to Sunday night. I’d planned to patch these same servers to get them on to Exchange 2010 SP2 UR3, in part because I’d noticed a worrisome number of events generated by the MSExchange ADAccess service, chiefly event ID 2112:

Process MSEXCHANGEADTOPOLOGYSERVICE.EXE (PID=8356). The Exchange computer HQ-EX02.blahblah does not have Audit Security Privilege on the domain controller HQ-DC01.blahblah. This domain controller will not be used by Exchange Active Directory Provider.

This was immediately followed by MSExchange ADAccess event ID 2102 with the rather ominous message that

Process MSEXCHANGEADTOPOLOGYSERVICE.EXE (PID=8356). All Domain Controller Servers in use are not responding:

However, the event ID 2080 logged by ADAccess indicated that all but 1 of the GCs were up and providing necessary services, including indicating that their SACL allowed Exchange the necessary access. I couldn’t puzzle it out in the time I had allotted, so I decided to take a backup (see rule #3) and  wait to tackle the patching until I could be physically present. That turned out to be a very, very good idea.

Last night, I sat down to patch the affected systems. I began with the passive DAG node, updating it to SP2 and then installing UR3. I half-thought that this process might resolve the cause of the errors (see rule #2), but after a reboot I noticed they were still being logged. I suspected that the reported 2102 errors might be bogus, since I knew all of the involved GCs were running and available. As I started to dig around, I learned that this error often appears when there’s a problem with permissions; to be more specific, this SCOM article asserts that the problem is that the Exchange server(s) don’t have the SeSecurityPrivilege user right on the domain controllers. However, I was still a little skeptical. I checked the default DC GPO and, sure enough, the permissions were present, so I moved on to do some further investigation.

Another possible cause is that the Exchange servers’ computer accounts aren’t in the Exchange Servers group, or that permissions on that group were jacked up somehow, but they appeared to be fine so I discounted that as a likely cause.

Along the way I noticed that the FBA service wouldn’t start, but its error message was meaningless– all I got from the service control manager was a service-specifc error code that resisted my best attempts to Bing it. Without that service, of course, you can’t use OWA with FBA mode, which would be a problem so I made a mental note to dig into that later.

A little more searching turned up this article, which is dangerously wrong: it suggests adding the Exchange computer accounts to the Domain Admins security group. Please, please, don’t do this; not only does it not fix the problem, it can cause all sorts of other tomfoolery that you don’t want to have to deal with.

Still more digging revealed two common problems that were present on this server: the active NIC wasn’t first in the binding order and IPv6 was disabled on the two enabled NICs. Now, you and I both know that IPv6 isn’t required to run Exchange.. but Microsoft does not support disabling or removing IPv6 on Windows servers. And you know what they say about what “unsupported” means! So, I enabled IPv6 on the two adapters and got the binding order sorted out, then bounced the AD topology service and…

… voila! Everything seemed to be working normally, so I ran some tests to verify that *over was working as it should, then started patching the DAG primary– only to have setup fail partway through. Upon reboot, w3svc was caught in an endless loop of trying to load some of the in-proc OWA DLLs; it kept trying endlessly until I power-cycled the server. The problem with this is that the Active Manager service was starting, so the current active node would try to sync with it before mounting its copy of the DAG databases, but it never got an answer! Net result, no mounted databases on either server, and an unhappy expression on my face as the clock ticked past 11pm.

I put the primary member server in safe mode, then set the Exchange and w3svc services to manual star and rebooted it. Rather than spend a lot of time trying to pin down exactly what happened, I ran setup in recovery mode; it installed the binaries, after which the services restarted normally. I did a switchover back to the original primary node, verified mail flow, and went home. Life was good.

Until, that is, this morning, when I got an e-mail: “OWA is down.” I checked the servers and, sure enough, the errors were back and the FBA service was again refusing to start. After some creative swearing, I once again started digging around in the guts of the server. I couldn’t shake the feeling that this was a legitimate permissions problem of some kind.

At that point, I found this article, which pointed out something critical about GPOs: you have to check the effective policy, not just the one defined in the default policy. Sure enough, when I used RSoP to check the effective policy on the DCs, the Exchange servers did not have SeSecurityPrivilege on the DCs because there was a separate GPO to control audit logging permissions, and it had recently been changed to remove the Exchange Servers group. That was easy to fix: I added the Exchange Servers group to the GPO, ran gpudate, rebooted the passive node, and found that the services all started normally and ran without error. A quick switchover let me restart the topology service on the primary DAG member, after which it too ran without errors. End result: problem solved.

It’s still not entirely clear to me why that particular service needs to have the SeSecurityPrivilege right assigned. I’m trying to find that out and will update this post once I do. In the meantime, if you have similar symptoms, check to verify that the effective policy is correct.

4 Comments

Filed under UC&C

What "supported" really means

If I had a nickel for every time I had had a discussion like the below…

<Customer> wants to <do something>. I don’t think it’s a good idea and tried to explain that to them. They want to do it anyway. Is it supported?

The particular discussion that triggered this post was a conversation among MCMs concerning a customer who wanted to know if they could configure an Exchange 2010 server so that it was dual-homed, with one NIC on the LAN and another in their DMZ. There are a number of good reasons not to do this, most related to one of two things: the inability to force Windows and/or Exchange to use only one of the installed NICs for certain operations, or the lack of knowledge about how to configure everything properly in such a configuration. For example, you’d have to be careful to get static routes right so that you only passed the traffic you wanted on each interface. You’d also have to be careful about which AD sites your server appeared to be a member of.

The big issue for me: that configuration would add complexity. Any time you add complexity, you should be able to clearly articulate what you’re gaining in exchange. Performance, scalability, flexibility, security, cost savings.. there has to be some reason to make it worth complicating things. This is a pretty fundamental principle of designing anything technical, from airplanes to washing machines to computer networks, and you violate it at your peril.

In this case, the gain is that the customer wouldn’t need to use TMG or a similar solution. That seems like an awfully small gain for the added complexity burden and the supportability issues it raises.

You might be wondering why I’d bring up supportability in this context. The cherry on the sundae was this comment from the fellow who started the thread: “It’s not written that you can’t do it, so they assume that means you can.” This is a dangerous attitude in many contexts, but especially so here.

I’ve said it before (and so has practically everyone who has ever written about Exchange), but it bears repeating:

Just because something is not explicitly unsupported, that doesn’t mean it is supported.

Microsoft doesn’t– indeed, can’t— test every possible configuration of Exchange. Or Windows. Or any of their other products (well, maybe except for closed consumer systems like Windows Phone and Xbox 360). So there’s a simple process to follow when considering whether something meets your requirements for supportability:

  1. Does Microsoft explicitly say that what you want to do is, or is not, supported?
  2. If they don’t say one way or the other, are you comfortable that you can adequately test the proposed change in your environment to make sure that it only has the desired effects?

Point 1 is pretty straightforward. If Microsoft says something’s explicitly supported, you’re good to go. If they explicitly say something is unsupported, you’re still good, provided you don’t do it.

Brief digression: when Microsoft says something’s unsupported, it can mean one of three specific things:

  • We tested it. It doesn’t work. Don’t do it. (Example: a long list of things involving Lync device provisioning.)
  • We tested it. It works. It’s a bad idea for some other unrelated reason. Don’t do it. (Example: going backupless with a 2-copy DAG.)
  • We didn’t test it. We don’t know if it works. You could probably figure out some way to make it work.  If it doesn’t work, on your own head be it. (Example: the prior stance on virtualization of Exchange roles.)

OK, where was I? Oh yeah: if Microsoft doesn’t make an explicit statement one way or another, that is not an unconditional green light for you to do whatever you want. Instead, it’s an invitation for you to think carefully about what you’ll gain from the proposed configuration. If what you want to do is common, then there will probably be a support statement for it already; the fact that there isn’t should give you pause right there. If you believe the gain is worth the potential risk that comes from an increase in complexity, and you can demonstrate through testing (not just a SWAG) that things will work, only then should you consider proceeding.

(n.b. permission is hereby granted for all you Exchange folks out there to copy this and send it to your customers next time they ask you for something dangerous, ignorant, unsupportable, or otherwise undesirable.)

5 Comments

Filed under Musings, UC&C

How Autodiscover works in Outlook 2011

Fellow Exchange MVP Rajith Enchiparambil, proprietor of the excellent How Exchange Works blog, asked an interesting question the other day: how does Autodiscover work in Outlook 2011? Is it different from the way Autodiscover works in Outlook for Windows?It turns out that the answer is (as you might have predicted) “it depends.” To answer that question in depth, we have to dig into the guts of Autodiscover (or AutoD, as its friends call it).

The first thing to know is that there are two parts to AutoD. One is the service that runs on Exchange 2007 and later. This service is implemented as a virtual directory named “Autodiscover” on the CAS role. When you install the CAS role, the vdir is automatically created and provisioned for you. In addition to the vdir, an Active Directory service connection point (SCP) object is created. (For probably more detail on SCPs than you’d want, see this article.)

See, in Windows Outlook, there are two primary ways that AutoD can work: domain-joined Windows machine can perform an LDAP lookup to find an AD SCP, or any machine can try to hit a predefined series of URLs. Why are there two methods? Because this design allows a computer, or device, to find the correct Exchange CAS whether it’s domain-joined or not, and whether or not it’s on the internal corporate network.

(See what I did there? I said device, because mobile devices can use AutoD also. Currently, iOS and Windows Phone 7.x devices use AutoD, as do some Exchange ActiveSync clients on some Android devices. For our purposes, we’ll treat mobile devices just like Macs insomuch as they use similar web-based queries for the AutoD vdir.)

So let’s ignore the SCP lookup process. How does Mac Outlook 2011 use AutoDiscover?

First it tries to connect to the standard AutoD URL, which is made up of the primary Exchange SMTP domain plus /Autodiscover/Autodiscover.xml. For example, https://robichaux.net/Autodiscover/Autodiscover.xml would be the first URL Outlook would try for an account in the robichaux.net domain. If that works, great. If not, it will then try tacking “autodiscover” onto the FQDN and keeping the same relative path.

If neither of those standard URL requests, both of which are made using HTTPS, bear fruit, the next attempt will be to do an HTTP request for the second URL. This request will be redirected if HTTP-to-HTTPS redirection is in use, which is what we want– if a redirection occurs, Outlook will catch the HTTP 302 response and make an AutoD request against the redirected URL.

If that check fails, the next step is to perform a DNS SRV lookup to try to find the FQDN of an Exchange CAS. If the SRV query returns a target machine, Outlook will tack on /autodiscover/autodiscover/xml to it and perform an AutoD query against the result.

Once Outlook or a mobile device gets back an Autodiscover manifest, of course, what it does with the result will vary according to its capabilities. For example, Outlook 2011 and mobile Exchange ActiveSync clients don’t (currently) use the returned URL for the target mailbox’s Exchange unified messaging (UM) server.

This process is generally pretty robust unless you’ve misconfigured the Autodiscover or service URLs on the CAS. It turns out that there’s a separate Exchange Web Services (EWS) external URL property on the CAS, and if you fail to set that properly– say, if some of your users snuck some Macs or iPads or something onto your network– then AutoD will return the EWS URL that you set, which will be wrong, so Mac Outlook won’t connect properly. The Test-* cmdlets are very useful in tracking this kind of problem down; Exchange MVP Tim Harrington has written a good primer on their use.

2 Comments

Filed under UC&C

Advice to Exchange ActiveSync developers

Now the folks at Apple, Google, and other ISVs who develop Exchange ActiveSync clients no longer have excuses for bugs and misfeatures in their clients. Why? Because Katarzyna Puchala of Microsoft (already one of my favorite Microsofties thanks to her work as part of the Exchange unified messaging team) has posted three very detailed articles on how clients should behave when synchronizing with Exchange servers:

That means, third parties, that there are no longer any credible excuses for why your clients do things like randomly delete meeting requests, or fail to work with EAS autodiscover. Sadly these articles come after the release of OS X Lion, and past the point at which EAS bugs are likely to meet the release bar for iOS 5… but I can always hope that the first service release for each of those operating systems will include fixes to make their EAS implementations act right.

Leave a comment

Filed under UC&C

Exchange Connections Fall 2010 call for sessions

My co-chairs and I are working on assembling this year’s Exchange Connections content, which we’ll be presenting November 1-4 in Las Vegas at good ol’ Mandalay Bay. That’s why I’m posting this call for sessions!

Everything you should need to know is in this document.

The deadline for session proposals is May 6 – hurry, hurry, as usual! Although the deadline is May 6, the sooner you can send in session proposals, the better the odds are we’ll be able to choose your sessions. I’ll try and respond to your submissions on the same business day with any thoughts or requests or tweaks. The conference has a brochure to get out pretty much ASAP if we’re going to get people to show up, so time is – as always – of the essence.

Note that we’ll be co-located, as usual, with dedicated conferences for Visual Studio, ASP.NET, Windows, SharePoint, and goodness knows what else – so for these proposals, stick strictly with Exchange and OCS topics.

If you want to submit sessions, see the call for sessions. If you have questions, you can ask them here or via e-mail.

Comments Off on Exchange Connections Fall 2010 call for sessions

Filed under UC&C

First look: Snow Leopard and Exchange

Given that I’m in Palo Alto, and that probably half of my coworkers use Macs, it’s no surprise that I installed Snow Leopard today. I’m not going to review the OS, or even the Exchange capability, but here are a few notes based on my long-time Entourage use (and not a little time spent with Outlook 2010 over the past few months). Herewith my thoughts:

  • The first thing I noticed: Mail.app is smokin’ fast compared to Entourage EWS. I mean, we’re talking lightning. EWS has much improved sync performance compared to DAV sync, but Mail.app leaves it in the dust when it comes to scrolling, searching, and message rendering. I haven’t tried to compare the two programs’ sync speed (and probably won’t, since it’s mostly relevant when you set up a new account).
  • Speaking of setup: I was able to set up 4 Exchange accounts in about 10 seconds each: enter e-mail address and password, then let Autodiscover do the rest. EWS Autodiscover works well most of the time, but occasionally it will fail to detect an account.
  • By default, Mail creates a single unified Inbox view– exactly what I use in Entourage (and what I wish for in Outlook 2010). However, nowhere can I find where Mail tells me how many messages are in a folder, something I like to keep track of.
  • I like it that Mail.app uses the same sounds for sent and received mail that the iPhone does. On the other hand, I dislike the fact that you can’t change these sounds (on either platform). C’mon, Apple.
  • Ironically, older versions of Mail would hide some Exchange folders when you connected because Mail couldn’t handle them. Guess what? This version fails to hide some folders, such as “Conversation Action Settings” and “Quick Step Settings”, that Outlook 2010 creates as ostensibly hidden folders in your mailbox root. Oops.
  • Entourage seems to do a better job of masking temporary connectivity problems. When Mail.app decides that one of my servers is unreachable, it grays out that server’s entire folder tree and puts the little tilde-looking icon next to the account name. By contrast, Entourage will discreetly add “(Not Connected)” to the account name and leave it at that.
  • iCal… well, what can I say? I still don’t like it after all these years. Yes, it syncs with my Exchange calendars now, but its visual display is ugly compared to Entourage (especially for overlapping events), it’s lacking in features, and the task support appears to have been hastily bolted on.
  • I’ve never been a user of the Address Book app. Given the way this version works, I’m not about to start. Too much wasted white space and too many missing features. For example, want to see someone’s management chain? Too bad, Address Book doesn’t show that. Feel like searching the GAL? Sorry, no can do (at least not that I can find.)

There are other problems, too– no support for setting your out-of-office status, for example. In terms of fit and finish, there are lots of little grace notes that Entourage gets right but that Apple stumbled with. To show just one example, take a look at these two screen shots, one for each program.

Microsoft EntourageScreenSnapz001.png   iCalScreenSnapz001.png

IMHO, Entourage does a better job all around. It tells me that my machine and my appointment are in different time zones. It clearly shows the important data about when my test meeting’s invitees are available. Once you type in an invitee’s name, there’s no way to delete the event in iCal unless you remove all invitees first. Attempting to close the window gives you a chance to edit or send the invite, but not get rid of it altogether. (Bonus: thought it was interesting that Entourage could get and display Atalla’s status (OOF, in this case) but that iCal couldn’t, even though I took the screen shots on the same machine and more or less at the same time.)

More broadly I don’t like going back to the world of having three separate apps for PIM functions. It reminds me of Sidekick for DOS. I much prefer the Outlook/Entourage model of having several different (but related) data types in one place. What makes this worse is that there’s relatively little integration among the Snow Leopard apps. For example, if you’re looking at a contact in Address Book and want to send that person a mail message– too bad. There’s no way to do so. You can, however, right-click an e-mail address in Mail to open that address’ contact card.

Still more broadly, these applications are not very flexible or customizable compared to Entourage. For example, let’s say you want your message reading pane on the right. Too bad! There’s no way in Mail.app to customize it; you need WideMail or something like it, of which there is no Snow Leopard version (yet).

So, Snow Leopard delivers what Apple promised: basic Exchange integration. There are so many things that they’ve left out, though, that I remain disappointed, and I’m thinking that the Microsoft Mac Business Unit has a huge lead already as they move into full-scale development of Outlook for Mac

5 Comments

Filed under General Tech Stuff, Reviews

20% discount on Microsoft Certified Master: Exchange September rotation

Neato! I just got mail from Greg Taylor, head of the MCM: Exchange program. They’re offering a $3,550 discount on the upcoming Exchange 2007 rotation (September 21-October 10). Register here to get the discount. Disclaimer: I teach the UM portion of the MCM class, and Greg’s offering instructors a bounty for new registrants, so I benefit directly when people sign up. However, the training is so good that you should disregard my interests altogether and sign up anyway. (If you do, please drop me an e-mail to let me know!)

Comments Off on 20% discount on Microsoft Certified Master: Exchange September rotation

Filed under UC&C