Just over a month since its initial release, Microsoft is making the Windows 10 October 2018 Update widely available today. The update was withdrawn shortly after its initial release due to the discovery of a bug causing data loss.
New Windows 10 feature updates use a staggered, ramping rollout, and this (re)release is no different. Initially, it’ll be offered only to two groups of people: those who manually tell their system to check for updates (and that have no known blocking issues due to, for example, incompatible anti-virus software), and those who use the media-creation tool to download the installer. If all goes well, Microsoft will offer the update to an ever-wider range of Windows 10 users over the coming weeks.
For the sake of support windows, Microsoft is treating last month’s release as if it never happened; this release will receive 30 months of support and updates, with the clock starting today. The same is true for related products; Windows Server 2019 and Windows Server, version 1809, are both effectively released today.
The problems with this update have provoked increased scrutiny of Microsoft’s testing and development processes and “Windows-as-a-Service” delivery model. The data-loss bugs had been reported by a number of members of the Windows Insider program, but for whatever reason, those bug reports weren’t treated with the priority and importance that they deserved. Microsoft’s immediate reaction was to allow insiders to include an indication of how important each bug is—data-loss bugs are obviously more important than, say, ugly icons or misaligned text—though it remains to be seen whether this will be enough to improve the quality and utility of the reports.
Measurements and metrics
As a longer-term followup, the company has promised to be more open about the Windows 10 development and testing process, and over the next few months we can expect to learn more about the approach the company uses and how it’s changing that approach in response to this issue. Microsoft tracks the quality of its software a number of ways and across several dimensions. The October 2018 Update highlighted a problem with “initial quality”—the stability and reliability of a new feature update—which generally indicates that something was missed in upstream testing. Windows 10’s monthly Patch Tuesday updates have also raised concerns over what is referred to as “sustained quality”—the reliability and effectiveness of the stream of updates that service each feature release.
While Microsoft acknowledges that the October 2018 update had problems, the company maintains that, overall, the trajectory is a good one. Redmond references two particular metrics to assess overall satisfaction with the quality of the operating system. First is the “incident rate,” the number of customer reports (including reports to OEMs) made with each new release. This incident rate has steadily declined throughout Windows 10’s life. This suggests that Microsoft is doing something right, but incident rate can be misleading. A Windows version with a minor cosmetic bug that hits lots of people and hence is widely reported would tend to have a worse incident rate than a version with a major, data loss-causing bug that only affects a fraction of the user base, but the latter issue is nonetheless far more important.
Similarly, the company is looking at “Net promoter score”—whether users would recommend each version to their family and friends—and says that this, too, is heading in the right direction, with the April 2018 release the most highly recommended version of Windows yet. Again, though, the focus on this metric can mask problems that have a high impact on a small number of users.
With the complexity of the Windows installed base—700 million users, 175 million versions of 35 million applications, and 16 million unique driver/hardware combinations—there’s a lot of scope for these high-impact, configuration-specific problems to emerge. Indeed, the data-loss bug was just such a problem. It didn’t affect Windows installations using the default configuration, instead only biting when an optional feature was used.
Passing the test
As part of this new openness, the company has outlined in high-level terms some of the testing it performs. There’s a suite of automated tests, and certain critical tests must be passed successfully before new features and code can be integrated into the main Windows codebase. Within Microsoft there’s also wide deployment of new Windows builds, with many of the company’s staff using the very latest builds. Major OEM partners also run their own test labs, providing a wider range of hardware and software testing.
We also understand that the development process is not set in stone and that the company wants to improve the process. Our previous examination of the development process was informed by extensive discussion with company insiders, both past and present; as such, we’re confident that there are certain flaws in the process—not all code has tests, not all test failures are regarded as blocking issues, and the main development branch is not required to be production-quality at all times. The result is that Microsoft fundamentally treats Windows development as a condensed version of its old “waterfall” process, wherein lots of bugs and instability are introduced during an intensive period of development, followed by a longer period of bug fixing. This means that each new feature update represents a quality dip, and it takes some months to recover.
There’s some variation between teams—some are a lot more disciplined about their testing and code quality than others—but ultimately we feel that this process will need to change, in a rigorous and consistent way, in order to get the initial quality of each feature update to the level it needs to be. Windows users should be able to install the updates with confidence; until the process is improved, questions and concerns about Windows 10’s quality and reliability will remain.