Disclaimer: I am not an expert in all things SCCM. I have not had any kind of training and I have no access to our local SCCM server at all, not even read-only access. Everything I know is self-taught from bits and pieces I have picked up here and there. A huge jigsaw puzzle with lots of holes to fill – not to mention all the pieces I have dropped under the table by now.
Our
resident SCCM guy has been working on deploying Windows 10 1909 to our clients
for a couple of weeks now and after overcoming an issue (with my help) that
prematurely killed the task sequence (a story for another time) he finally
managed to deploy a system that satisfied the requirements I had set.
But
even before that, he had accidentally deployed the “1909 Insider Preview”
package to a total of seven machines. In addition to those, we had of course manually
upgraded a handful machines to 1909 to test it out or to fix other random
issues with Windows 10 clients here and there.
Right
after the accidental deployment of the Insider Preview version of 1909 we
noticed that the SCCM Software Center was in English, and not our usual German
locale. A repair installation of the SCCM Client fixed that issue.
Then
we discovered that the 1909 clients would not install the latest Windows
updates. Despite the updates being rolled out to those systems they would not even see
the advertisements for those updates. A repair installation of the SCCM Client
did not help either. The only fix we had for this issue was a complete uninstall
and reinstall of the SCCM Client. Within minutes the computers would download
their policies and start pulling updates and software. However, this did not
help for long. The next day the computers would fail again.
The
only two symptoms we could observe on the computers were that the SCCM Client
would no longer install any updates and software. And the fact that if you
tried the stop the “SMS Agent Host” (ccmexec) service it would hang around the
50% mark and then time out with an error that it could not be stopped.
Again,
the only thing that helped here was uninstalling and reinstalling the SCCM Client.
Over
the next two weeks we tried figuring out what the hell was going on.
Our resident SCCM guy had quickly identified the culprit - in his eyes at least. Our third-party antivirus solution, Symantec Endpoint Protection. But I prefer a detailed analysis of such issues over an over-eager witch hunt with no evidence to back up such claims.
Our resident SCCM guy had quickly identified the culprit - in his eyes at least. Our third-party antivirus solution, Symantec Endpoint Protection. But I prefer a detailed analysis of such issues over an over-eager witch hunt with no evidence to back up such claims.
So we took an inventory of all 1909 machines we had at that point and to our surprise, we found a couple computers with 1909 where the SCCM Client was working perfectly. However, we were unable to figure out what made them special. Both groups of working and not working computers contained machines that were manually installed, automatically deployed with SCCM and machines that were upgraded from prior Windows 10 versions.
Nothing
made sense.
Looking
at the SCCM server for clues we noticed that all affected clients were marked
as “inactive”. Heartbeat (DDR), hardware inventory and policy update dates were
all way too old. Skipping ahead a few days, we could not find anything on the
server that would help us figure out why the clients would no longer pull
policies and refuse to install anything.
With no access to the SCCM server myself I had to ask one of my co-workers for reports and status updates on individual clients almost daily. That not only annoyed me to no end but my co-workers too. But none of the co-workers with access to the SCCM server could find a report that would tell us the date and time of the last heartbeat (DDR) sent by all the 1909 clients. A little bit of searching the internet brought me to Eswar Koneti’s blog post from 2011, giving us the SQL query we needed. Modifying it slightly to meet our exact needs we got the following query:
SELECT Sys.Netbios_Name0 AS Name,
CASE WHEN Sys.Client0='1' THEN 'Yes' ELSE 'No' END AS 'Client Installed?',
ad.AgentName, Min(ad.AgentTime) AS 'Time Stamp', os.caption0 [OS], os.BuildNumber0 [BuildNumber]
FROM v_r_system Sys INNER JOIN
v_FullCollectionMembership fcm ON fcm.ResourceID =Sys.ResourceID
INNER JOIN v_AgentDiscoveries ad ON ad.ResourceId=Sys.ResourceID
INNER JOIN v_GS_OPERATING_SYSTEM OS on os.resourceid=sys.resourceid
WHERE (fcm.CollectionID = 'SC100661') AND ad.AgentName LIKE 'Heartbeat Discovery' AND os.BuildNumber0 = '18363'
GROUP BY Netbios_Name0, Client0, AgentTime, os.Caption0, os.BuildNumber0, ad.AgentName
ORDER BY ad.AgentTime DESC
This little SQL query gives us a list of all Windows 10 1909 (build 18363) machines in the collection with the ID “SC100661” and the date when they last sent a heartbeat. Henceforth I just requested this report when I needed to check for broken SCCM clients instead of requesting the dates for a number of machines individually.
By a
stroke of luck one of my co-workers, more by accident than anything else, found
out that cleaning the messaging queue of the SCCM client on the computer would
make the computer almost instantly download policies and install updates and
software.
The
software we use to do this kind of stuff remotely executes the following lines
of Powershell:
Get-WmiObject -Query "SELECT * FROM Win32_Service WHERE Name
='CcmExec'" -Namespace "ROOT\cimv2"
(Get-Service 'CcmExec').Stop()
(Get-Service 'CcmExec').Status
(Get-WmiObject Win32_Processor
| where {$_.DeviceID -eq 'CPU0'}).AddressWidth
([wmi]"ROOT\ccm:SMS_Client=@").ClientVersion
(Get-ItemProperty("HKLM:\SOFTWARE\Microsoft\SMS\Client\Configuration\Client
Properties")).$("Local SMS Path")
Get-ChildItem 'C:\WINDOWS\CCM\ServiceData\Messaging\EndpointQueues'
-Include *.msg,*.que -Recurse | foreach ($_) {Remove-Item $_.FullName -Force}
Get-WmiObject -Query "SELECT * FROM Win32_Service WHERE Name
='CcmExec'" -Namespace "ROOT\cimv2"
(Get-Service 'CcmExec').Start()
(Get-Service 'CcmExec').Status
([wmiclass]'ROOT\ccm:SMS_Client').TriggerSchedule('{00000000-0000-0000-0000-000000000112}')
I am not quite sure exactly which parts are required for this to work. We did not
deep-dive into this workaround, because we were looking for a proper solution –
not just a workaround. Though the GUID “{00000000-0000-0000-0000-000000000112}”
in the last line, according to the Send Schedule Tool reference from Microsoft, is for a “State System
policy cache cleanout”. The rest of the commands should be self-explanatory.
But
being just a workaround, much like the uninstall/reinstall one, this only fixed
the SCCM Client up for a day or so. Usually on the next day the SMS Agent Host
(ccmexec) service would be unresponsive again.
Having
done extensive research online and finding multiple people having the same
issue we were having (prajwaldesai.com, reddit.com, Technet,
...), I turned to Reddit’s SCCM community and posted in a promising thread where people were collecting information and
discussing how to fix this issue.
The
OP (u/Vintalage) of that Reddit thread pointed me in the right direction. Supposedly it had
something to do with our third-party antivirus solution. So much like what our resident SCCM guy has been saying all this time. We had however already
ruled out any relation to the third-party Antivirus’ (Symantec Endpoint
Protection) version being an issue, having installed 14.2 RU1 MP1, 14.2 RU2 and
14.2 RUS MP1 on affected systems.
Symantec
themselves claim that 14.2 RU1 is fully compatible with Windows 10 1909
according to their support matrix. But if you check the release notes for 14.2 RU2 you will notice that only then they mention adding
support for 1909.
Nevertheless,
what nudged me in the right direction was the following:
Apparently,
Microsoft changed something in 1903/1909 and the ccmexec agent performs a
prerequisite check against the Windows Defender firewall and antivirus
services. If they are not running, it will attempt to turn them on. If that
fails, it causes the ccmexec agent to go into a hung state.
And
indeed, the “Windows Defender Antivirus Service” (WinDefend) service was set to
“manual” start and when trying to start it manually it would throw a selection
of errors. Some of those errors made no sense at all to me.
After
some digging I found the right switch to turn on the service. After a restart
of the computer the SCCM Client would almost instantly start downloading
policies and install updates and software.
The
switch I found is hiding in the “Windows Defender Security Center” under “Virus
& Thread protection”. Scrolling to the bottom of the page you will see a
collapsed entry called “Windows Defender Antivirus options”. Expand it and it
will reveal a switch to turn on or off a feature called “Periodic scanning”.
Enabling
this feature was what made the SCCM Client work again. I could successfully
stop and start the “SMS Agent Host” (ccmexec) service too. So all was good! ...
Or so I thought.
Looking
into how to enable this switch on all 1909 clients – 130 of them at this point
in time, up to 3500 in the future – I certainly did not want to have to do that
manually on all those machines. Time for some GPOs. But I was soon faced with
the harsh reality that there simply is no GPO that will turn this switch on.
None whatsoever.
Simply
setting the “Windows Defender Antivirus Service” to “automatic” start through
either a services or a registry GPP did not work either. The service would
simply not start.
Searching
around the internet again I found several other people looking for exactly the
same thing I was. A way to automatically turn on “limited periodic scanning”. (Reddit.com
(1), Reddit.com
(2), Spiceworks,
...)
In
prior versions of Windows 10 (up to and including 1803) there was a command
line option to turn the “Periodic scanning” feature on:
C:\Windows\System32\SystemSettingsAdminFlows.exe
Defender SideBySideOn
However,
since 1809 this command does not seem to do anything anymore. So I was back to having to
do this manually on all affected clients? No way.
So I
unpacked SysInternal’s Process Monitor (procmon) and started analyzing what happens on the
system when you turn on/turn off that switch in the Windows Defender Security
Center.
After
digging through the hundreds of thousands of events (even after filtering out
all the irrelevant processes), I found what I was looking for.
The
switch in the Windows Defender Security Center GUI sets seven (relevant) registry
keys:
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows
Defender]
"DisableAntiSpyware"=dword:00000000
"DisableAntiVirus"=dword:00000000
"PassiveMode"=dword:00000002
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WdBoot]
"Group"="Early-Launch"
"Start"=dword:00000000
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WdFilter]
"Start"=dword:00000000
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\WinDefend]
"Start"=dword:00000002
In
addition to that, I also set two more GPOs for Windows Defender:
Windows
Components\Windows Defender Antivirus
-
Allow
antimalware service to remain running always = Enabled
-
Turn
off Windows Defender Antivirus = Disabled
Which
set the following two registry keys:
[HKEY_LOCAL_MACHINE\SOFTWARE\Policies\Microsoft\Windows
Defender]
"DisableAntiSpyware"=dword:00000000
"ServiceKeepAlive"=dword:00000001
With
that the “Windows Defender Antivirus Service” (WinDefend) will automatically
start and the switch in the “Windows Defender Security Center” GUI will be
enabled too.
So apparently,
the trick was to modify not only the “WinDefend” service but the “WdBoot” and
“WdFilter” services too.
Most
importantly, the SCCM Client will no longer hang either. It will even install
updates and software again, even on Windows 10 1903 and 1909. Great success.
But
just like the OP from the Reddit thread, you are probably asking yourself “wait
... you are running Windows Defender Antivirus and a third-party antivirus
application in parallel? Will that not impact system performance?”
To
that I can only say: “No idea, to be honest.”
However,
on the “Windows Defender Antivirus compatibility” page Microsoft writes that they disable the Windows Defender Antivirus service when there is a “third-party product
installed that is not offered or developed by Microsoft”. But that only happens when the
organization is not enrolled in Microsoft Defender ATP. If the organization is
enrolled in Microsoft Defender ATP then the Windows Defender Antivirus service will be
configured to run in “passive mode”. Exactly what I am enabling with the
registry keys above.
I
have also enabled another GPO:
Windows
Components\Windows Defender Antivirus
-
Turn
off real-time protection = Enabled
With
that, I hope that the impact on system performance should be minimal, since
Windows Defender Antivirus should not be performing any real-time scanning. And
if the limited periodic scanning is still too much of a performance impact for
you, you can configure the intervals and other settings for that through GPOs too.
For
now we have found a solution that seems to work for us.
Fast forward to the February patch day and I can report that our 1909 clients are successfully installing all new updates, just like they are supposed to.
Fast forward to the February patch day and I can report that our 1909 clients are successfully installing all new updates, just like they are supposed to.
[1] https://www.reddit.com/r/SCCM/comments/dk6ze8/sms_agent_host_becomes_unresponsive_on_windows_10/
[7] https://www.reddit.com/r/sysadmin/comments/ebddsy/gpo_to_enable_periodic_wdefender_scan_even_with/
I am having this exact problem and have come to a similar conclusion! Our situation is that we are moving away from SEP and currently our only option is Defender. So we enabled the Defender management in SCCM and let the policies go out. Everything seemed to be working fine because SEP is set to keep Defender off so the policies didn't do anything. Then I pushed out an uninstall for SEP. What was supposed to happen is that when SEP was removed Defender would kick on and use it's policies which is exactly what I saw in testing. Then I noticed a number of computers that hadn't reported inventory in SCCM for a while. I finally made it to some of the threads you have posted above. In my case I had uninstalled the SEP client but not rebooted before the defender policies tried to apply and it locked up the agent. What I eventually did was remove SEP and reboot and THEN apply the defender policies in SCCM. I plan on testing the solution tomorrow and will update my findings. Basically it's exactly what you stated. SEP is interfering with Defender trying to access reg keys tied to services. My process will look like this:
ReplyDeleteMake collection in SCCM to contain all PCs with SEP installed
Make collection in SCCM to apply defender policies but exclude PCs in the SEP collection
Deploy an uninstaller for SEP to the SEP installed collection WITH A REBOOT on success.
In theory the PCs with SEP will uninstall the SEP client and when SCCM detects SEP is gone it will apply the defender policies.
Thanks for sharing your experiences. We had similar issues but fortunately the latest version of SCCM (2004) and SCCM Client (v 5.00.8968.1021) appears to have resolved the problem. We're utilising defender also.
ReplyDeleteI'm afraid I fucked up most of our systems due to this. the following file necessary for windows startup was corrupt C:\WINDOWS\system32\drivers\wd\WdBoot.sys
ReplyDelete