2020-02-17

[SCCM] SMS Agent Host (ccmexec) hangs on Windows 10 1903/1909 – no updates/software are being installed

Update 18.02.2020: I have gotten some new information that suggests that you might be able to solve this issue not only by enabling the Windows Defender Antivirus service on the client, but also by disabling the Endpoint Protection feature on the SCCM server instead. But since I do not have access to our SCCM server (as stated below) I cannot test this approach myself. Feel free to leave a comment below if this actually works too.


Disclaimer: I am not an expert in all things SCCM. I have not had any kind of training and I have no access to our local SCCM server at all, not even read-only access. Everything I know is self-taught from bits and pieces I have picked up here and there. A huge jigsaw puzzle with lots of holes to fill – not to mention all the pieces I have dropped under the table by now.

Our resident SCCM guy has been working on deploying Windows 10 1909 to our clients for a couple of weeks now and after overcoming an issue (with my help) that prematurely killed the task sequence (a story for another time) he finally managed to deploy a system that satisfied the requirements I had set.

But even before that, he had accidentally deployed the “1909 Insider Preview” package to a total of seven machines. In addition to those, we had of course manually upgraded a handful machines to 1909 to test it out or to fix other random issues with Windows 10 clients here and there.

Right after the accidental deployment of the Insider Preview version of 1909 we noticed that the SCCM Software Center was in English, and not our usual German locale. A repair installation of the SCCM Client fixed that issue.




Then we discovered that the 1909 clients would not install the latest Windows updates. Despite the updates being rolled out to those systems they would not even see the advertisements for those updates. A repair installation of the SCCM Client did not help either. The only fix we had for this issue was a complete uninstall and reinstall of the SCCM Client. Within minutes the computers would download their policies and start pulling updates and software. However, this did not help for long. The next day the computers would fail again.

The only two symptoms we could observe on the computers were that the SCCM Client would no longer install any updates and software. And the fact that if you tried the stop the “SMS Agent Host” (ccmexec) service it would hang around the 50% mark and then time out with an error that it could not be stopped.
Again, the only thing that helped here was uninstalling and reinstalling the SCCM Client.

Over the next two weeks we tried figuring out what the hell was going on.

Our resident SCCM guy had quickly identified the culprit - in his eyes at least. Our third-party antivirus solution, Symantec Endpoint Protection. But I prefer a detailed analysis of such issues over an over-eager witch hunt with no evidence to back up such claims.

So we took an inventory of all 1909 machines we had at that point and to our surprise, we found a couple computers with 1909 where the SCCM Client was working perfectly. However, we were unable to figure out what made them special. Both groups of working and not working computers contained machines that were manually installed, automatically deployed with SCCM and machines that were upgraded from prior Windows 10 versions.

Nothing made sense.

Looking at the SCCM server for clues we noticed that all affected clients were marked as “inactive”. Heartbeat (DDR), hardware inventory and policy update dates were all way too old. Skipping ahead a few days, we could not find anything on the server that would help us figure out why the clients would no longer pull policies and refuse to install anything.

With no access to the SCCM server myself I had to ask one of my co-workers for reports and status updates on individual clients almost daily. That not only annoyed me to no end but my co-workers too. But none of the co-workers with access to the SCCM server could find a report that would tell us the date and time of the last heartbeat (DDR) sent by all the 1909 clients. A little bit of searching the internet brought me to Eswar Koneti’s blog post from 2011, giving us the SQL query we needed. Modifying it slightly to meet our exact needs we got the following query:

SELECT Sys.Netbios_Name0 AS Name,
CASE WHEN Sys.Client0='1' THEN 'Yes' ELSE 'No' END AS 'Client Installed?',
ad.AgentName, Min(ad.AgentTime) AS 'Time Stamp'os.caption0 [OS], os.BuildNumber0 [BuildNumber]
FROM v_r_system Sys INNER JOIN
v_FullCollectionMembership fcm ON fcm.ResourceID =Sys.ResourceID
INNER JOIN v_AgentDiscoveries ad ON ad.ResourceId=Sys.ResourceID
INNER JOIN v_GS_OPERATING_SYSTEM OS on os.resourceid=sys.resourceid
WHERE (fcm.CollectionID = 'SC100661') AND ad.AgentName LIKE 'Heartbeat Discovery' AND os.BuildNumber0 = '18363'
GROUP BY Netbios_Name0, Client0, AgentTime, os.Caption0, os.BuildNumber0, ad.AgentName
ORDER BY ad.AgentTime DESC

This little SQL query gives us a list of all Windows 10 1909 (build 18363) machines in the collection with the ID “SC100661” and the date when they last sent a heartbeat. Henceforth I just requested this report when I needed to check for broken SCCM clients instead of requesting the dates for a number of machines individually.

By a stroke of luck one of my co-workers, more by accident than anything else, found out that cleaning the messaging queue of the SCCM client on the computer would make the computer almost instantly download policies and install updates and software.

The software we use to do this kind of stuff remotely executes the following lines of Powershell:

Get-WmiObject -Query "SELECT * FROM Win32_Service WHERE Name ='CcmExec'" -Namespace "ROOT\cimv2"
(Get-Service 'CcmExec').Stop()
(Get-Service 'CcmExec').Status
(Get-WmiObject Win32_Processor | where {$_.DeviceID -eq 'CPU0'}).AddressWidth
([wmi]"ROOT\ccm:SMS_Client=@").ClientVersion
(Get-ItemProperty("HKLM:\SOFTWARE\Microsoft\SMS\Client\Configuration\Client Properties")).$("Local SMS Path")
Get-ChildItem 'C:\WINDOWS\CCM\ServiceData\Messaging\EndpointQueues' -Include *.msg,*.que -Recurse | foreach ($_) {Remove-Item $_.FullName -Force}
Get-WmiObject -Query "SELECT * FROM Win32_Service WHERE Name ='CcmExec'" -Namespace "ROOT\cimv2"
(Get-Service 'CcmExec').Start()
(Get-Service 'CcmExec').Status
([wmiclass]'ROOT\ccm:SMS_Client').TriggerSchedule('{00000000-0000-0000-0000-000000000112}')

I am not quite sure exactly which parts are required for this to work. We did not deep-dive into this workaround, because we were looking for a proper solution – not just a workaround. Though the GUID “{00000000-0000-0000-0000-000000000112}” in the last line, according to the Send Schedule Tool reference from Microsoft, is for a “State System policy cache cleanout”. The rest of the commands should be self-explanatory.

But being just a workaround, much like the uninstall/reinstall one, this only fixed the SCCM Client up for a day or so. Usually on the next day the SMS Agent Host (ccmexec) service would be unresponsive again.

Having done extensive research online and finding multiple people having the same issue we were having (prajwaldesai.com, reddit.com, Technet, ...), I turned to Reddit’s SCCM community and posted in a promising thread where people were collecting information and discussing how to fix this issue.

The OP (u/Vintalage) of that Reddit thread pointed me in the right direction. Supposedly it had something to do with our third-party antivirus solution. So much like what our resident SCCM guy has been saying all this time. We had however already ruled out any relation to the third-party Antivirus’ (Symantec Endpoint Protection) version being an issue, having installed 14.2 RU1 MP1, 14.2 RU2 and 14.2 RUS MP1 on affected systems.
Symantec themselves claim that 14.2 RU1 is fully compatible with Windows 10 1909 according to their support matrix. But if you check the release notes for 14.2 RU2 you will notice that only then they mention adding support for 1909.

Nevertheless, what nudged me in the right direction was the following:

Apparently, Microsoft changed something in 1903/1909 and the ccmexec agent performs a prerequisite check against the Windows Defender firewall and antivirus services. If they are not running, it will attempt to turn them on. If that fails, it causes the ccmexec agent to go into a hung state.

And indeed, the “Windows Defender Antivirus Service” (WinDefend) service was set to “manual” start and when trying to start it manually it would throw a selection of errors. Some of those errors made no sense at all to me.

After some digging I found the right switch to turn on the service. After a restart of the computer the SCCM Client would almost instantly start downloading policies and install updates and software.

The switch I found is hiding in the “Windows Defender Security Center” under “Virus & Thread protection”. Scrolling to the bottom of the page you will see a collapsed entry called “Windows Defender Antivirus options”. Expand it and it will reveal a switch to turn on or off a feature called “Periodic scanning”.

Enabling this feature was what made the SCCM Client work again. I could successfully stop and start the “SMS Agent Host” (ccmexec) service too. So all was good! ... Or so I thought.

Looking into how to enable this switch on all 1909 clients – 130 of them at this point in time, up to 3500 in the future – I certainly did not want to have to do that manually on all those machines. Time for some GPOs. But I was soon faced with the harsh reality that there simply is no GPO that will turn this switch on. None whatsoever.

Simply setting the “Windows Defender Antivirus Service” to “automatic” start through either a services or a registry GPP did not work either. The service would simply not start.

Searching around the internet again I found several other people looking for exactly the same thing I was. A way to automatically turn on “limited periodic scanning”. (Reddit.com (1), Reddit.com (2), Spiceworks, ...)

In prior versions of Windows 10 (up to and including 1803) there was a command line option to turn the “Periodic scanning” feature on:

C:\Windows\System32\SystemSettingsAdminFlows.exe Defender SideBySideOn

However, since 1809 this command does not seem to do anything anymore. So I was back to having to do this manually on all affected clients? No way.

So I unpacked SysInternal’s Process Monitor (procmon) and started analyzing what happens on the system when you turn on/turn off that switch in the Windows Defender Security Center.

After digging through the hundreds of thousands of events (even after filtering out all the irrelevant processes), I found what I was looking for.

The switch in the Windows Defender Security Center GUI sets seven (relevant) registry keys:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows Defender]
"DisableAntiSpyware"=dword:00000000
"DisableAntiVirus"=dword:00000000
"PassiveMode"=dword:00000002

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WdBoot]
"Group"="Early-Launch"
"Start"=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WdFilter]
"Start"=dword:00000000

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WinDefend]
"Start"=dword:00000002

In addition to that, I also set two more GPOs for Windows Defender:

Windows Components\Windows Defender Antivirus
-         Allow antimalware service to remain running always = Enabled
-         Turn off Windows Defender Antivirus = Disabled

Which set the following two registry keys:

[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows Defender]
"DisableAntiSpyware"=dword:00000000
"ServiceKeepAlive"=dword:00000001

With that the “Windows Defender Antivirus Service” (WinDefend) will automatically start and the switch in the “Windows Defender Security Center” GUI will be enabled too.

So apparently, the trick was to modify not only the “WinDefend” service but the “WdBoot” and “WdFilter” services too.

Most importantly, the SCCM Client will no longer hang either. It will even install updates and software again, even on Windows 10 1903 and 1909. Great success.


But just like the OP from the Reddit thread, you are probably asking yourself “wait ... you are running Windows Defender Antivirus and a third-party antivirus application in parallel? Will that not impact system performance?”

To that I can only say: “No idea, to be honest.”

However, on the “Windows Defender Antivirus compatibility” page Microsoft writes that they disable the Windows Defender Antivirus service when there is a “third-party product installed that is not offered or developed by Microsoft”. But that only happens when the organization is not enrolled in Microsoft Defender ATP. If the organization is enrolled in Microsoft Defender ATP then the Windows Defender Antivirus service will be configured to run in “passive mode”. Exactly what I am enabling with the registry keys above.

I have also enabled another GPO:

Windows Components\Windows Defender Antivirus
-         Turn off real-time protection = Enabled

With that, I hope that the impact on system performance should be minimal, since Windows Defender Antivirus should not be performing any real-time scanning. And if the limited periodic scanning is still too much of a performance impact for you, you can configure the intervals and other settings for that through GPOs too.

For now we have found a solution that seems to work for us.

Fast forward to the February patch day and I can report that our 1909 clients are successfully installing all new updates, just like they are supposed to.











3 comments:

  1. I am having this exact problem and have come to a similar conclusion! Our situation is that we are moving away from SEP and currently our only option is Defender. So we enabled the Defender management in SCCM and let the policies go out. Everything seemed to be working fine because SEP is set to keep Defender off so the policies didn't do anything. Then I pushed out an uninstall for SEP. What was supposed to happen is that when SEP was removed Defender would kick on and use it's policies which is exactly what I saw in testing. Then I noticed a number of computers that hadn't reported inventory in SCCM for a while. I finally made it to some of the threads you have posted above. In my case I had uninstalled the SEP client but not rebooted before the defender policies tried to apply and it locked up the agent. What I eventually did was remove SEP and reboot and THEN apply the defender policies in SCCM. I plan on testing the solution tomorrow and will update my findings. Basically it's exactly what you stated. SEP is interfering with Defender trying to access reg keys tied to services. My process will look like this:
    Make collection in SCCM to contain all PCs with SEP installed
    Make collection in SCCM to apply defender policies but exclude PCs in the SEP collection
    Deploy an uninstaller for SEP to the SEP installed collection WITH A REBOOT on success.

    In theory the PCs with SEP will uninstall the SEP client and when SCCM detects SEP is gone it will apply the defender policies.

    ReplyDelete
  2. Thanks for sharing your experiences. We had similar issues but fortunately the latest version of SCCM (2004) and SCCM Client (v 5.00.8968.1021) appears to have resolved the problem. We're utilising defender also.

    ReplyDelete
  3. I'm afraid I fucked up most of our systems due to this. the following file necessary for windows startup was corrupt C:\WINDOWS\system32\drivers\wd\WdBoot.sys

    ReplyDelete