Via Ed Bott, a fascinating article on real-world robustness from Windows 7 and Windows 8 PCs: Want the most reliable Windows PC? Buy a Mac (or maybe a Dell). You should read the article, which outlines a report issued by Soluto, a cloud-based PC health and service monitoring company. Their report analyzes data reported to their service by customers to attempt to answer the question of which manufacturer’s PCs are the most reliable. Apple’s 13″ MacBook Pro comes out on top, with Acer’s Aspire E1-571 coming in second and Dell’s XPS 13 in third. In fact, out of the top 10, Apple has two spots, Acer has two spots, and Dell has five. Ed points out that it’s odd that Hewlett-Packard doesn’t have any entries in the list, and that Lenovo (which I have long considered the gold standard for laptops not made by Apple) only has one.
The report, and Ed’s column, speculate on why the results came out this way. I don’t know enough about the PC laptop world to have a good feel for how many of the models on their list are consumer-targeted versus business-targeted, although they do include cost figures that help provide some clues. There’s no doubt that the amount of random crap that PC vendors shovel on to their machines makes a big difference in the results, although I have to suspect that the quality of vendor-provided drivers makes a bigger difference. Graphics drivers are especially critical, since they run in kernel mode and can easily crash the entire machine; the bundled crapware included by many vendors strikes me as more of an annoyance than a reliability hazard (at least in terms of unwanted reboots or crashes.)
The results raise the interesting question of whether there are similar results for servers. Given that servers from major vendors such as Dell and H-P come with very clean Windows installs, I wouldn’t expect to see driver issues play a major part in server reliability. My intuition is that the basic hardware designs from tier 1 vendors are all roughly equal in reliability, and that components such as SAN HBAs or RAID controllers probably have a bigger negative impact on overall reliability than the servers themselves– but I don’t have data to back that up. I’m sure that server vendors do, and equally sure that they guard it jealously.
More broadly, it’s fascinating that we can even have this discussion.
First of all, the rise of cloud-based services like Soluto (and Microsoft’s own Windows Intune) means that now we have data that can tell us fascinating things. I remember that during the development period of Windows 2003, Microsoft spent a great deal of effort persuading customers to send them crash dumps for analysis. The analysis revealed that the top two causes of server failures were badly behaving drivers and administrator errors. There’s not much we can do about problem #2, but Microsoft attacked the first problem in a number of ways, including restructuring how drivers are loaded and introducing driver signing as a means of weeding out unstable or buggy drivers. But that was a huge engineering effort led by a single vendor, using data that only they had– and Microsoft certainly didn’t embarrass or praise any particular OEM based on the number of crashes their hardware and drivers had.
Second, Microsoft’s ongoing effort to turn itself into a software + services + devices company (or whatever they’re calling it this week) means that they are able to gather a huge wealth of data about usage and behavior. We’ve seen them use that data to design the Office fluent interface, redesign the Xbox 360 dashboard multiple times, and push a consistent visual design language across Windows 8, Windows Phone 8, Xbox 360, and apps for other platforms such as Xbox SmartGlass. It’s interesting to think about the kind of data they are gathering from operating Office 365, and what kind of patterns that might reveal. I can imagine that Microsoft would like to encourage Exchange 2013 customers to share data gathered by Managed Availability, but there are challenges in persuading customers to allow that data collection, so we’ll have to see what happens.
To the cloud…