An interesting debugging tale

I had quite the merry chase running down a problem at work this morning: a Windows 7 workstation on which VMware Server 2.0 could neither be removed nor installed. (Side note: yes, I know VMware Server 2 isn’t officially supported on Win7.)

The first clue was the Windows installer error message saying that the installer couldn’t read the "UNKNOWN\Components\{GUID}" registry key. UNKNOWN, huh? You’d think that the installer would know what keys it was trying to read.

I started by doing a little binging to find anything relevant. VMware KB article 1308 described the steps to take to manually remove a failed install so I followed its steps… twice, just to be on the safe side (well, and because I skipped a couple of steps the first time). No luck.

Next, I fired up one of my favorite-ever troubleshooting tools, Process Monitor. It told me that the failure was actually happening when the installer tried to get write access to a subkey of HKLM\Software\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18. I’d never even heard of that particular key, so off to bing I went. It’s actually owned by the LocalSystem account, except that in this case it wasn’t– the permissions on the key that VMware wanted (and its subkeys) were all out of whack.Resetting the permissions manually didn’t work because the parent key was still (correctly) owned by LocalSystem. So, I fired up psexec to open an interactive session with regedit owned by LocalSystem, set the correct ownership on the key and its subkeys, and ran install again.

This time it got further before failing; there was another key under that subtree that also had wrong permissions. Fortunately I’d left regedit running, so a quick ownership change and another reinstall and boom! back in business.

How’d this happen in the first place? Well, as much as I like to bash VMware (and boy, do I ever), this wasn’t their fault. As near as I can tell, the problem arose because this machine was originally built with a 160GB SSD as the boot volume and a 1TB drive as the data volume. Our app performs better on an SSD, but it also has a lot of data, so the better configuration would have been to have the 1TB drive be the boot volume. Someone tried to reconfigure the machine by imaging the SSD, putting the image on the 1TB drive, and changing the boot configuration. However, when they did so, they neglected to notice that the LocalSystem token changed, so the permissions on some entries in the registry were wrong. I think they’re all fixed now.

Not a bad way to start a Monday morning– but only because I fixed the problem.

Comments Off on An interesting debugging tale

Filed under General Tech Stuff

Comments are closed.