After several installations and Skype for Business 2015 (S4B) Server upgrades, a colleague of mine pushed my attention to a large ammount of event id 4097 warnings on the Administrative Events view related to Windows Fabric. The rate if this flood could be 3-5 events every 5-15 minutes:
– “cert chain trust status is in error: 0x1000040”
– “ignore error 0x80092013:certificate revocation list offline”
You can check all those events on the specific Windows Fabric log shown on the below picture.
Question: Why and where do this event come from?
S4B installs Windows Fabric 3.0 and added, between others, additonal securitys setting in the fabric intracluster communications. You can find this settings on several configuration files:
%PROGRAMDATA%\Windows Fabric\FabricHostSettings.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\ClusterManifest.current.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Data\InfrastructureManifest.xml”
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Config.<version>\Settings.xml”
The relevant settings related to the events can be found on the ‘Security’ section:
<Section Name=”Security”>
….
<Parameter Name=”SessionExpiration” Value=”28800″ />
<Parameter Name=”IgnoreCrlOfflineError” Value=”true” />
<Parameter Name=”CrlCheckingFlag” Value=”3221225476″ />
….
</Section>
The IgnoreCrlOfflineError is self-explanatory: If Windows Fabric encounters this error, ignore it and continue operations. For the CrlCheckingFlag values we need to look a little ‘deeper’. The Windows Fabric configuration files are generated by the Front-End server (RTCSRV / RtcHost module) at service startup based on this template file:
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template
On this file you will find a little more about the flags value:
<!–
CrlCheckingFlag setting follows the rest of the Lync Server components (sipstack, web) which set the following flags:
CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL=0x00000004 | // do not go on the wire for cert retrieval
CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY=0x80000000 | // do not go on the wire for cert revocation check
CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT=0x40000000
0xC0000004=3221225476 (unsigned int)
–>
<Parameter Name=”CrlCheckingFlag” Value=”%CRLCHECKINGFLAG%” />
More information regardind the flags can be found on MSDN CertGetCertificateChain function.
Basically, S4B services (including the Fabric) are configured to check certificates revocation (CRL) using local cache only, as the comments on the template says: do not go on the network to retrieve and check the CRL.
So… how does it check for a certificate revocation?
Running a trace on the Fabric process, we can see that it never tries to connect to a CRL distribution server but, a little before the events are logged, it reads the registry keys related to certificates.
Since there are no cached or local CRL available, it will report an CRL offline error, ignore it, but still write it on the event log.
Do the other Skype for Business services report this error? No, it’s hidden. They cause the same issue but, to see it, you need to enable CAPI2 logging:
Question: What can we do about this ?
I can see at least 3 options:
Option A (The ‘no worries’ one) – Ignore it!
Difficult level: noob
Advantages: Nothing to be done. It’s a normal S4B operation and nothing is affected
Disadvantages: Your administrative events view will be full of these events which makes difficult to find important warning
Option B (The ‘clean’ one) – provide the CRL to the local server
Difficult level: professional services
Advantages: You will not just stop seeing the Fabric events, but it will also optimize all the other S4B services
Disadvantages: Requires knowledge and script programming, since you want to schedule a task to get the CRL and put it on the machine certificate local store.
This option also proves how the CRL caching / ignoreofflineCRL works.
Instructions:
1. Find the certificate name used by Windows Fabric. You can get this from the Fabric configuration files (spoiler alert: it’s the same used by Lync internal services)
2. Find on the certificate the Distribution paths where to get the CRL
3. Download the CRL and install it on the Trust Root CA or Intermediate CA store depending if the certificate was issue by a Root or a Subordinate CA
As soon as you upload the CRL to the local computer certificate store, you will see that the Fabric process will read the CRL from the store.
… and no more 4097 events !
As you noticed, the CRL are updated regularly (daily or weekly) so they have an expiration and need to be retrieved. This means that doing this manually is a error-prune, time-comsuming task, which you can solve with a scheduled script (not included on this blog… yet!)
Option C (The ‘tweak’ one) – manipulate the CrlCheckingFlag
Difficult level: moderate: just need to know what you are doing
Advantages: It’s easiest way to stop seeing the Fabric events
Disadvantages: It’s a S4B undocumented parameter customization (altough I don’t believe MS will not refuse support if they found out). It can also be overwritten by an update.
This ‘quick fix’ option is about manipulating the file
‘<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template’
Intructions:
1. Change the parameter of the ClusterManifests.Xml.Template
<Parameter Name=”CrlCheckingFlag” Value=”0″ />
2. Stop the RTCSRV and FabricHostSvc services
3. Start the RTCSRV service – it will create new Fabric configuration files based on the template file and also start the FabricHostSvc
Actually, according to this blog, this parameter value is required if you issue certificates withouth any CRL Distribution Point information…. or your S4B pool will not start at all (!)
– Newly installed Skype for Business Front-End Pool refuses to start
Final notes:
- If you want to know more about Windows Fabric and S4B we have a great and detailed explanation on this blog
Did you tried to change this behavior to check for CRL online? Excluding the CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY flag
Hello Oleg
As I stated: “…Basically, S4B services (including the Fabric) are configured to check certificates revocation (CRL) using local cache only”
The flag 0 should “force” that (unless there is a specific flag for that which I could not find), but the fact is that the Fabric engine doesn’t try that.
Luis, thank you for reply.
I suggest not to nullify this value, just substract the CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY flag, which results in 1073741828 (0x40000004), leaving CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL (0x00000004) & CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT (0x40000000), as the chain (root & issuing CA’s certificates) is already distributed by AD & I don’t need to check for the root certificate’s revocation status, it’s offline & fully trusted.
My initial issue is the script, which I was tried to implement for a CRL cache importing. It’s failed to remove a previously imported & already expired CRLs so the Intermediate Cert List CRL store keeps growing. Maybe you’ve already solved this issue…
$servers = @(“s4bfe-01”, “s4bfe-02”, “s4bfe-03”)
$script = {
$workdir = “c:\temp\”
$workdir = “c:\temp\”
if (Test-Path ($workdir + “ca1p.crl”)) {
&certutil -delstore -enterprise CA ($workdir + “ca1p.crl”)
Remove-Item ($workdir + “ca1p.crl”) -Force
}
if (Test-Path ($workdir + “ca1.crl”)) {
&certutil -delstore -enterprise CA ($workdir + “ca1.crl”)
Remove-Item ($workdir + “ca1.crl”) -Force
}
Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1.crl’ -OutFile ($workdir + “ca1.crl”)
Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1+.crl’ -OutFile ($workdir + “ca1p.crl”)
&certutil -addstore -enterprise -f CA ($workdir + “ca1.crl”)
&certutil -addstore -enterprise -f CA ($workdir + “ca1p.crl”)
}
$servers | % {Invoke-Command -ComputerName $_ -ScriptBlock $script}
I didn’t go on that direction of importing the cache so there is not quick answer for you
The certutil -delstore will not work because it requires the CertID
https://technet.microsoft.com/library/cc732443.aspx#BKMK_delstore
CertId — Certificate or CRL match token. This can be a serial number,
an SHA-1 certificate, CRL, CTL or public key hash,
a numeric cert index (0, 1, etc.),
a numeric CRL index (.0, .1, etc.),
a numeric CTL index (..0, ..1, etc.),
a public key, signature or extension ObjectId,
a certificate subject Common Name,
an e-mail address, UPN or DNS name,
a key container name or CSP name,
a template name or ObjectId,
an EKU or Application Policies ObjectId,
or a CRL issuer Common Name.
Many of the above may result in multiple matches.
And as a 3rd option you can go ‘pure powershell’:
Set-Location Cert:\LocalMachine\My
Get-ChildItem
it’s very limited in commands at this point.
get-command -module PKI
And I didn’t see a way to get to the CRL from here
Thanks for your suggestion Oleg,
but the 0 choice was achieved for a reason: I tried all the possible combinations with the 3 (and also some other flags).
Believe me, I took me several days and reverse engineering to get to understand the logic of the programming.
Ex: If you keep the CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL you will get the same flood of warnings. Just a different error/text code.
There might be a way, but not direct:
– use certutil -store -enterprise CA
– look for the CRL on the list and check for CRL Hash(sha1)
– use certutil -delstore -enterprise CA “”
You can also get more fields from the crl file: certutil -dump ca1p.crl
It’s makes me curious so I just tried on one of my FE & I don’t have any warnings with the 1073741828 (0x40000004) value at all. Will stick with this solution for now, hope it’s won’t load servers too much. Thanks for the article thoroughly describing the cause of this problem.
Thanks for testing on your side.
Curious that you have a different behaviour from me with those configurations
Unfortunately, a KHI metrics shows me a bad results for the “LS:WEB – Auth Provider related calls\WEB – Failed validate cert calls to the cert auth provider” counter so I’ve to investigate my way on a scripting, here’s the result, feel free to use it on your purposes:
Clear-Host
$certlist = &certutil -store -enterprise CA
$now = Get-Date
$marker = “================”
$pattern = (Get-culture).DateTimeFormat.ShortDatePattern + ” ” + (Get-culture).DateTimeFormat.ShortTimePattern
$CRLList = @()
$CRL = @{}
$is_CRL = $false
$workdir = “c:\temp\”
# Parse for a CRL entries
for ($i=0; $i -lt $certlist.Count; $i++) {
switch -Regex ($certlist[$i]) {
“$marker CRL (\d{1,}) $marker” {if ($is_CRL) {$CRLList += , $CRL; $CRL=@{}; } $is_CRL = $true}
“$marker Certificate (\d{1,}) $marker” {if ($is_CRL) {$CRLList += , $CRL; $CRL=@{}; } $is_CRL = $false}
“CertUtil: -store command completed successfully.” {$CRLList += , $CRL; $CRL=@{}; “eof”}
default {
if ($is_CRL) {
$string = $certlist[$i] -split ‘: ‘
switch ($string[0]) {
“Issuer” {$CRL.add(“Issuer”, $string[1])}
” ThisUpdate” {$CRL.add(“ThisUpdate”, [datetime]::ParseExact($string[1], $pattern, $null))}
” NextUpdate” {$CRL.add(“NextUpdate”, [datetime]::ParseExact($string[1], $pattern, $null))}
“CRL Entries” {$CRL.add(“CRLEntries”, $string[1])}
“CA Version” {$CRL.add(“CAVersion”, $string[1])}
“CRL Number” {$CRL.add(“CRLNumber”, ($string[1] -split “=”)[1] -replace ” “)}
“Delta CRL Indicator” {$CRL.add(“DeltaCRLNumber”, ($string[1] -split “=”)[1] -replace ” “)}
“CRL Hash(sha1)” {$CRL.add(“CRLHash”, $string[1] -replace ” “)}
}
}
}
}
}
#Remove an expired CRLs
$CRLList | % {
if ($_.’NextUpdate’ -lt $now) {& certutil -delstore -enterprise CA $_.’CRLHash’}
}
# Install a fresh CRLs
Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1.crl’ -OutFile ($workdir + “ca1.crl”)
Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1+.crl’ -OutFile ($workdir + “ca1p.crl”)
&certutil -addstore -enterprise -f CA ($workdir + “ca1.crl”)
&certutil -addstore -enterprise -f CA ($workdir + “ca1p.crl”)
Remove-Item ($workdir + “ca1.crl”) -Force
Remove-Item ($workdir + “ca1p.crl”) -Force
Also, it’s may be a good idea to add another check for a certificate’s Issuer, change this:
if ($_.’NextUpdate’ -lt $now) {& certutil -delstore -enterprise CA $_.’CRLHash’}
to this
if (($_.’NextUpdate’ -lt $now) -and ($_.’Issuer’ -like ‘*Contoso*’)) {& certutil -delstore -enterprise CA $_.’CRLHash’}
Thanks Oleg for providing an answer for the clean solution 😉
Great research, very interesting article, thanks a lot.
Anybody here who could solve this issue?
If I try to import the CRL into local cert store, it works but the warning doesn’t go away and I can’t see the CRL anywhere in certificate MMC
Hello! I imported CRL in Intermediate CA. Fabric found it in registry, but error 4097 still appear in event log. Should i restart SFB services after import CRL ? Any ideas?
Should not be required at all or it would defeat the purpose of CRL publishing over time.
Happy to know my content could help you