2018-01-16

Upgrading Windows 10 Enterprise 1607 to 1709 - And why useful error messages are important

As mentioned in previous posts we have about 3500 client computers and at this point somewhere around 600 of them are running Windows 10 Enterprise 1607 (the Anniversary Edition). Updates are handled through SCCM (System Center Configuration Manager) and so are the Windows 10 feature upgrades (going to be).

We had already run some test with the Creators Update 1703 but only rolled that out to a handful of machines, mostly internally in the IT department. So last Friday it was finally time to test the deployment of the Fall Creators Update 1709 to about 70 clients all at the same time. We wanted to see what kind of impact this would have on the network. The 70 machines were located on two different floors and supplied by two different switches. I will not go into detail about the numbers here but one switch showed a throughput of about 500Mbit/s for around 20 minutes supplying 40 or so machines. (Later I did another upgrade and 1 machine caused a peak network usage of 180Mbit/s for about 2 minutes.) But this is not the point of this post.



Of those 70 machines we tried to upgrade almost all finished without any real problems. Except for four of them. 4 out of 70. And those four machines all showed the same error message on the screen and were waiting for user input after getting to 44% on setting things up.


"Windows Setup could not configure Windows on this computer’s hardware." was what the error message read, or rather "Windows kann nicht für die Ausführung auf der Hardware dieses Computers konfiguriert werden." since we are running German systems.

After clicking "OK" the system would do a rollback and I was sitting in front of a Windows 10 Enterprise 1607 again.

This does not really make sense. All systems we tried to upgrade are of the same brand and model. Namely Fujitsu Esprimo P756. And there are no internal hardware differences.

So obviously the first thing I checked was for external devices. USB devices and whatnot. And the first machine I checked indeed had a wireless receiver for a mouse and a USB hub connected, but the second and third machines I checked had no additional peripherals at all. So that could not be the issue.

From here on out I tried doing the upgrade after every single step I took. And just like on Friday the upgrade would download, do its magic, reboot the machine, get to 44% on setting things up and then display the error message above.

Maybe the image got corrupted during download because of bad memory? Nope. Memory check on the machines did not find any faults there either. I even went so far and connected one of the machines to a different switch using a different network cable just to rule out a faulty switch/cable as well.

Next I tried installing the upgrade from DVD. Maybe there is an issue with the image supplied by SCCM in general. Starting the upgrade from DVD seemed to work, the machine would reboot, do something and then show the login screen. After logging in I was greeted with a new error message.


0x80070570 - 0x2000c
The installation failed in the SAFE_OS phase with an error during APPLY_IMAGE operation.
(In German: "Die Installation war nicht erfolgreich. In der Phase SAFE-OS ist während des Vorganges APPLY-IMAGE ein Fehler aufgetreten.")

Ok. An error code. That is more useful. I can search the internets for that. Or so I thought.

You will find all kinds of people reporting this error, suggesting all kinds of solutions, and none of them helped. But for completion's sake, let me list all the things that I tried. And again, after each step I tried to run the upgrade again just for it to fail again.

The most prominent suggestions, like with any other Windows problem you will ever encounter and decide to search for on the internet, were the following two commands:
  • sfc /scannow
  • chkdsk /F
Just to make sure I also ran the following commands:
  • Dism /Online /Cleanup-Image /ScanHealth
  • Dism /Online /Cleanup-Image /CheckHealth
  • Dism /Online /Cleanup-Image /RestoreHealth
And again, no luck. The upgrade continued to fail.

Some posts on the internet suggested that the issue could be a blocksize of 512 and after changing the blocksize to 4096 they were able to run the upgrade without problems. So I checked the blocksize with the following commands:
  • (in cmd:) fsutil fsinfo ntfsinfo C:
  • (in PS:) Get-WmiObject -Query "SELECT Label, Blocksize, Name FROM Win32_Volume WHERE FileSystem='NTFS'" | Select Label, Blocksize
And as expected, the systems were already using a blocksize of 4096. So that could not be the issue either.
 
The very first error message said something about hardware. So I ran the Fujitsu driver update tool (because I am too lazy to check every driver by hand and I wanted to make sure that I actually update all the drivers) and updated all the drivers, updated firmware and even did a BIOS update. But again, the upgrade failed.
 
I checked the BIOS settings to make sure the harddisk controller was not misconfigured (read: IDE/AHCI/RAID). But no issue here either.
 
A coworker suggested that it probably was a piece of incompatible software. So I gathered all the software installed on the failing machines and a few machines where the upgrade completed just fine. I found not a single software that was exclusive to the failing machines. Every piece of software found on those machines was also installed on machines that worked fine. I still went ahead and uninstalled some old software (Exchange 2010 Management console, for one (Yes, really. 2010.)) that might be causing issues. But that did not help either. Still failing.
 
The AV installed on all systems is the same. Symantec Endpoint Protection 12.1.6 RU6 MP6 Build 7061. But since this is installed on all systems, failing or not, this could not be the cause either. I still upgraded one machine to version 14 and uninstalled it completely on another machine. Both machines still failed to install the upgrade. So the AV was not the cause either, as expected.
 
I also tried deleting all the files downloaded by the SCCM client in previous upgrade attempts, the whole SCCM cache on the client and even reset Windows Update following this guide. And again the upgrade would fail.
 
At this point not only was the day over but I was also pretty much out of ideas. Searching the internet for various error codes found in the various log files did not reveal any other grand strategies either.
 
After posting about this adventure on Reddit and asking for ideas on what else to try I got a link to this document detailing the various log files the Windows upgrade process would write and in which stages this would happen. Since I knew when the upgrade would fail (always at 44% after a reboot and before doing a rollback) I now knew which log file to look at.
 
And guess what. Pretty much at the end of a 85MB log file with well over 550.000 lines of unparseable gibberish a few lines caught my eyes.
 
2018-01-16 11:26:50, Error                 SYSPRP ActionPlatform::LaunchModule: Could not load DLL C:\Windows\System32\inetsrv\iissyspr.dll; dwRet = 0x7e[gle=0x0000007e]
2018-01-16 11:26:50, Error                 SYSPRP SysprepSession::ExecuteAction: Failed during sysprepModule operation; dwRet = 0x7e[gle=0x0000007e]
2018-01-16 11:26:50, Error                 SYSPRP SysprepSession::ExecuteInternal: Error in executing action for Microsoft-Windows-IIS-SharedLibraries; dwRet = 0x7e[gle=0x0000007e]
2018-01-16 11:26:50, Error                 SYSPRP SysprepSession::Execute: Error in executing actions from C:\Windows\System32\Sysprep\ActionFiles\Specialize.xml; dwRet = 0x7e
2018-01-16 11:26:50, Error                 SYSPRP RunPlatformActions:Failed while executing Sysprep session actions; dwRet = 0x7e
    2018-01-16 11:26:50, Info                  IBS    Callback_Specialize: Internal Providers Specialized Failed. System can't proceed to handle Internal Providers
2018-01-16 11:26:50, Info                  IBS    Callback_Specialize: Specialize return: [126]
2018-01-16 11:26:50, Error      [0x060435] IBS    Callback_Specialize: An error occurred while either deciding if we need to specialize or while specializing; dwRet = 0x7e
2018-01-16 11:26:50, Info       [0x0640ae] IBSLIB PublishMessage: Publishing message [Windows Setup could not configure Windows on this computer’s hardware.]
 
There was that by now more than familiar error message. And Sysprep failed with error code 0x7e. And it was complaining about some "C:\Windows\System32\inetsrv\iissyspr.dll" that (also going by the following lines) could only belong to some IIS components.
 
But IIS?! Why? There is no IIS webserver installed on those machines. So I checked the installed Windows features and found this:
 

Excuse the German but you should be able to figure it out. I removed all the IIS features, rebooted and started the upgrade process again. Download complete, reboot, ... , 42%, 43%, 44%, 45%, 46%, ... uh. It passed the critical point. And even better it actually finished the upgrade without any other errors.

So ... two IIS Windows features caused the Windows 10 1709 upgrade to fail. Apparently something all the "update readiness" checks are completely unaware of. And rather than giving you a useful error message all you get is some nonsense that Windows could not be configured on this hardware. Even installing from DVD the error message you got was utterly useless.

Next I checked all the other machines that were failing. And as expected all of them had those IIS features installed. Namely:
  • IIS-IIS6ManagementCompatibility                      
  • IIS-Metabase
After uninstalling them the remaining machines also completed the upgrade successfully.


Mystery #1 solved. On to mystery #2. Why were these Windows features installed on these systems in the first place?

Along comes one of the department's employees who has been there way longer than me. And that employee showed me a guide on "How to install the Exchange 2010 Management Console" and in that many years old guide was a picture showing to enable/install these two particular Windows features. Why and what for these features are required? Noone can remember.

But wait. Earlier I mentioned that I did not find any software that was exclusive to the failing machines. But if those Windows features were enabled when installing the Exchange console following that guide then surely other people would have done the same when installing the Exchange console on their system... Except they did not. Yes, other people did install the Exchange console on their Windows 10 machines too. But they never enabled these Windows features. And the console is still working just fine.


So the end of the story is that we have a completely outdated guide on how to install an outdated piece of software to manage some of our outdated servers and none of the people who have been working here much longer than me remember why that stuff is in that guide in the first place. And all that came back to bite me in the ass when trying to upgrade Windows 10 to the latest version. Thank you, Murphy.

[1] https://twitter.com/BeingSysAdmin/status/953400528976928768

[2] https://www.reddit.com/r/sysadmin/comments/7qqt7m/help_needed_trying_to_upgrade_windows_10/

5 comments:

  1. This is awesome. I had this issue (we had just moved up from Exch 2007 last year, but the console was still on my machine). I had banged my head a few times trying to figure this out like yourself with AV, etc.

    Thanks for the info!

    ReplyDelete
  2. You should clearly link this with a brief description in the feedback hub.
    Like IIS 6 components prevents upgrade with both the error code you received

    ReplyDelete
  3. 3 years could not upgrade from 1607. Until I found iissyspr.dll in the logs. Microsoft, as usual, conflicts with itself.

    ReplyDelete
  4. I was actually able to fix my issue with the W10 Enterprise 2016 LTSB to W10 Enterprise 2019 LTSC upgrade errors (same as above) by simply mounting the .iso image on the Win10 machine I was upgrading, and running the upgrade from there.

    I didn't have any IIS features installed.

    Microsoft really needs to make their error codes more specific to what the issue is, or list them in an easier to find way. I spent a few days working through this issue.

    ReplyDelete