Unfortunately, part 1 was only the beginnings of my troubles with Exchange.
Where’d my RPCs go?
I was puzzled to find that the SA still wouldn’t start. I was getting event ID 1005 on each start attempt, with a message that no protocol sequences were found. Of course, the message didn’t tell me which protocol, so I made sure that the IIS protocol servers were all installed and registered. No go. I dug around in the KB and TechNet for a while, with no luck. I tried running Exchange setup with /disasterrecovery. No go. I tried reinstalling the release version of SP2. Nope. That didn’t work either.
Back to
groups.google.com, where I waded through a huge quantity of complaints about ID 1005 until I found this gem: add the HKEY_LOCAL_MACHINESoftwareMicrosoftRpcClientProtocols
key and see if that fixed the problem. Lo and behold, it did– once I added the key back, the SA would start. Of course, clients still couldn’t connect. More digging revealed that the registry settings for the RPC portmapper were gone too.
I suspect that this is because setup didn’t see my NIC, so it didn’t rewrite the related registry keys during the repair or reinstall operations. However, the client for MS file/print services components were still there. I hit Google again and found a suggestion to remove and reinstall that component. Once I did that, the SA started and clients could log on via MAPI. That was the good news.
The bad news: further experimentation found that clients could log on via MAPI, but no mailboxes were visible except for the one owned by the SA. I couldn’t log on at all with POP3 or IMAP4, and OWA didn’t work either. I spent a few hours puzzling over this to no gain. Finally, about 2230 on Saturday, I gave up and called Microsoft PSS. The screener quickly took my credit card number and opened a new case, while I waited on hold. And waited, and waited, and… you get the idea.
Digression: While waiting, I visited
slashdot, where I eventually found a link to an extremely cool
bridge-construction game called Pontifex.
It’s highly addictive and hard to describe: you build a bridge of steel beams,
then run a train over it to see whether it’s strong enough. All in 3-D, with
cool OpenGL graphics. Someone I know is getting this for Christmas.
After about 90 minutes on hold, the screener came back and asked if he could arrange a callback. I said sure, call me Monday morning, because I don’t work on Sundays. At this point, I was so ill with the entire process that I decided to try re-restoring my backup. My original restore attempts had failed because restore.env always came out as a 0-byte file– that meant that neither the store nor eseutil could get the necessary information from it. This time, though, I got a valid restore file and was able to put everything back the way it was before the failure. The store mounted, I logged on, and synchronized my Smartphone. Ahhhh. Life is good.
The aftermath
The server seems to be functioning normally; inbound and outbound mail flows normally, and all of my public folders are replicating as they should. I still have trouble synchronizing my OST file, but there’s still time to fix that. I have some clean-up work to do, including getting Intel and Asus to replace their failed products, finding a good-quality DLT stacker on eBay, and writing all my editors to ask them to resend anything I missed during the outage.
Lessons learned
So, what did I learn from all this?
- If you don’t do regular backups, you deserve what you get. I was lucky
that all I lost was 3 days worth of incoming mail; it would have been hard to
explain to my wife where her calendar and contacts folders had gone. - By default, the E2K SMTP service will queue mail for 2 days before NDRing
it. I changed this setting partway through the outage so that mail would hang
around longer; this helped me reduce the total amount of lost mail. - Be aware that swapping system disks between disparate motherboards may not work. (Of course, if you follow point #1 above and keep good system state backups, you’ll be laughing.)
- If I had been using SCSI for my system disk, much of this foolishness
could have been avoided. - lne.com, home of the cypherpunks mailing list, now thinks I’m a spammer and
won’t let me resubscribe. When I asked the postmaster to fix this, he
mentioned that E2K generated three bounce messages for each mail I couldn’t
receive– and cypherpunks generates 50+ messages per day. Ooops. - Don’t expect instant help from PSS on Saturday night. This should probably be self-evident.
- Exchange is cruel.
Epilogue
Asus replaced the failed motherboard; Intel replaced the bad CPU, and all is well a month after the original failure. Life is good– but Exchange is still cruel.
