S4B Front-end servers event 4097 flooding

After several installations and Skype for Business 2015 (S4B) Server upgrades, a colleague of mine pushed my attention to a large ammount of event id 4097 warnings on the Administrative Events view related to Windows Fabric. The rate if this flood could be 3-5 events every 5-15 minutes:
– “cert chain trust status is in error: 0x1000040”
– “ignore error 0x80092013:certificate revocation list offline”
You can check all those events on the specific Windows Fabric log shown on the below picture.

Question: Why and where do this event come from?

S4B installs Windows Fabric 3.0 and added, between others, additonal securitys setting in the fabric intracluster communications. You can find this settings on several configuration files:
%PROGRAMDATA%\Windows Fabric\FabricHostSettings.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\ClusterManifest.current.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Data\InfrastructureManifest.xml”
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Config.<version>\Settings.xml”

The relevant settings related to the events can be found on the ‘Security’ section:

  <Section Name=”Security”>
….
    <Parameter Name=”SessionExpiration” Value=”28800″ />
    <Parameter Name=”IgnoreCrlOfflineError” Value=”true” />
    <Parameter Name=”CrlCheckingFlag” Value=”3221225476″ />
   ….
  </Section>

The IgnoreCrlOfflineError is self-explanatory: If Windows Fabric encounters this error, ignore it and continue operations. For the CrlCheckingFlag values we need to look a little ‘deeper’. The Windows Fabric configuration files are generated by the Front-End server (RTCSRV / RtcHost module) at service startup based on this template file:
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template

On this file you will find a little more about the flags value:
      <!–
        CrlCheckingFlag setting follows the rest of the Lync Server components (sipstack, web) which  set the following flags:
               CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL=0x00000004 |  // do not go on the wire for cert retrieval
               CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY=0x80000000 |  // do not go on the wire for cert revocation check
               CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT=0x40000000
                                                              0xC0000004=3221225476 (unsigned int)
      –>
      <Parameter Name=”CrlCheckingFlag” Value=”%CRLCHECKINGFLAG%” />

More information regardind the flags can be found on MSDN CertGetCertificateChain  function.
Basically, S4B services (including the Fabric) are configured to check certificates revocation (CRL) using local cache only, as the comments on the template says: do not go on the network to retrieve and check the CRL.

So… how does it check for a certificate revocation?
Running a trace on the Fabric process, we can see that it never tries to connect to a CRL distribution server but, a little before the events are logged, it reads the registry keys related to certificates.

Since there are no cached or local CRL available, it will report an CRL offline error, ignore it, but still write it on the event log.

Do the other Skype for Business services report this error? No, it’s hidden. They cause the same issue but, to see it, you need to enable CAPI2 logging:

Question: What can we do about this ?

I can see at least 3 options:

Option A (The ‘no worries’ one) – Ignore it!
Difficult level: noob
Advantages: Nothing to be done. It’s a normal S4B operation and nothing is affected
Disadvantages: Your administrative events view will be full of these events which makes difficult to find important warning

Option B (The ‘clean’ one) – provide the CRL to the local server
Difficult level: professional services
Advantages: You will not just stop seeing the Fabric events, but it will also optimize all the other S4B services
Disadvantages: Requires knowledge and script programming, since you want to schedule a task to get the CRL and put it on the machine certificate local store.

This option also proves how the CRL caching / ignoreofflineCRL works.
Instructions:
1. Find the certificate name used by Windows Fabric. You can get this from the Fabric configuration files (spoiler alert: it’s the same used by Lync internal services)
2. Find on the certificate the Distribution paths where to get the CRL

3. Download the CRL and install it on the Trust Root CA or Intermediate CA store depending if the certificate was issue by a Root or a Subordinate CA

As soon as you upload the CRL to the local computer certificate store, you will see that the Fabric process will read the CRL from the store.

… and no more 4097 events !

As you noticed, the CRL are updated regularly (daily or weekly) so they have an expiration and need to be retrieved. This means that doing this manually is a error-prune, time-comsuming task, which you can solve with a scheduled script (not included on this blog… yet!)

Option C (The ‘tweak’ one) – manipulate the CrlCheckingFlag
Difficult level: moderate: just need to know what you are doing
Advantages: It’s easiest way to stop seeing the Fabric events
Disadvantages: It’s a S4B undocumented parameter customization (altough I don’t believe MS will not refuse support if they found out). It can also be overwritten by an update.

This ‘quick fix’ option is about manipulating the file
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template’

Intructions:
1. Change the parameter of the ClusterManifests.Xml.Template
<Parameter Name=”CrlCheckingFlag” Value=”0″ />
2. Stop the RTCSRV and FabricHostSvc services
3. Start the RTCSRV service – it will create new Fabric configuration files based on the template file and also start the FabricHostSvc

Actually, according to this blog, this parameter value is required if you issue certificates withouth any CRL Distribution Point information…. or your S4B pool will not start at all (!)
Newly installed Skype for Business Front-End Pool refuses to start

 Final notes:

  • If you want to know more about Windows Fabric and S4B we have a great and detailed explanation on this blog
Advertisements

18 thoughts on “S4B Front-end servers event 4097 flooding

  1. Oleg Glushko 05/04/2017 / 04:53

    Did you tried to change this behavior to check for CRL online? Excluding the CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY flag

    • LuisR 05/04/2017 / 09:41

      Hello Oleg
      As I stated: “…Basically, S4B services (including the Fabric) are configured to check certificates revocation (CRL) using local cache only”
      The flag 0 should “force” that (unless there is a specific flag for that which I could not find), but the fact is that the Fabric engine doesn’t try that.

  2. Oleg Glushko 06/04/2017 / 01:26

    Luis, thank you for reply.
    I suggest not to nullify this value, just substract the CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY flag, which results in 1073741828 (0x40000004), leaving CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL (0x00000004) & CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT (0x40000000), as the chain (root & issuing CA’s certificates) is already distributed by AD & I don’t need to check for the root certificate’s revocation status, it’s offline & fully trusted.

    • Oleg Glushko 06/04/2017 / 01:37

      My initial issue is the script, which I was tried to implement for a CRL cache importing. It’s failed to remove a previously imported & already expired CRLs so the Intermediate Cert List CRL store keeps growing. Maybe you’ve already solved this issue…

      $servers = @(“s4bfe-01”, “s4bfe-02”, “s4bfe-03”)
       
      $script = {
      $workdir = “c:\temp\”
        $workdir = “c:\temp\”
        if (Test-Path ($workdir + “ca1p.crl”)) {
      &certutil -delstore -enterprise CA ($workdir + “ca1p.crl”)
      Remove-Item ($workdir + “ca1p.crl”) -Force
      }
      if (Test-Path ($workdir + “ca1.crl”)) {
      &certutil -delstore -enterprise CA ($workdir + “ca1.crl”)
      Remove-Item ($workdir + “ca1.crl”) -Force
      }
      Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1.crl’ -OutFile ($workdir + “ca1.crl”)
      Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1+.crl’ -OutFile ($workdir + “ca1p.crl”)
       
      &certutil -addstore -enterprise -f CA ($workdir + “ca1.crl”)
      &certutil -addstore -enterprise -f CA ($workdir + “ca1p.crl”)
      }
       
      $servers | % {Invoke-Command -ComputerName $_ -ScriptBlock $script}

      • LuisR 06/04/2017 / 07:05

        I didn’t go on that direction of importing the cache so there is not quick answer for you
        The certutil -delstore will not work because it requires the CertID
        https://technet.microsoft.com/library/cc732443.aspx#BKMK_delstore
        CertId — Certificate or CRL match token. This can be a serial number,
        an SHA-1 certificate, CRL, CTL or public key hash,
        a numeric cert index (0, 1, etc.),
        a numeric CRL index (.0, .1, etc.),
        a numeric CTL index (..0, ..1, etc.),
        a public key, signature or extension ObjectId,
        a certificate subject Common Name,
        an e-mail address, UPN or DNS name,
        a key container name or CSP name,
        a template name or ObjectId,
        an EKU or Application Policies ObjectId,
        or a CRL issuer Common Name.
        Many of the above may result in multiple matches.

      • LuisR 06/04/2017 / 08:00

        And as a 3rd option you can go ‘pure powershell’:
        Set-Location Cert:\LocalMachine\My
        Get-ChildItem

        it’s very limited in commands at this point.
        get-command -module PKI

        And I didn’t see a way to get to the CRL from here

    • LuisR 06/04/2017 / 06:35

      Thanks for your suggestion Oleg,

      but the 0 choice was achieved for a reason: I tried all the possible combinations with the 3 (and also some other flags).
      Believe me, I took me several days and reverse engineering to get to understand the logic of the programming.
      Ex: If you keep the CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL you will get the same flood of warnings. Just a different error/text code.

      • LuisR 06/04/2017 / 07:46

        There might be a way, but not direct:
        – use certutil -store -enterprise CA
        – look for the CRL on the list and check for CRL Hash(sha1)
        – use certutil -delstore -enterprise CA “”
        You can also get more fields from the crl file: certutil -dump ca1p.crl

      • Oleg Glushko 06/04/2017 / 08:01

        It’s makes me curious so I just tried on one of my FE & I don’t have any warnings with the 1073741828 (0x40000004) value at all. Will stick with this solution for now, hope it’s won’t load servers too much. Thanks for the article thoroughly describing the cause of this problem.

      • LuisR 06/04/2017 / 11:30

        Thanks for testing on your side.

        Curious that you have a different behaviour from me with those configurations

  3. Oleg Glushko 07/04/2017 / 04:44

    Unfortunately, a KHI metrics shows me a bad results for the “LS:WEB – Auth Provider related calls\WEB – Failed validate cert calls to the cert auth provider” counter so I’ve to investigate my way on a scripting, here’s the result, feel free to use it on your purposes:

    Clear-Host

    $certlist = &certutil -store -enterprise CA
    $now = Get-Date
    $marker = “================”
    $pattern = (Get-culture).DateTimeFormat.ShortDatePattern + ” ” + (Get-culture).DateTimeFormat.ShortTimePattern
    $CRLList = @()
    $CRL = @{}
    $is_CRL = $false
    $workdir = “c:\temp\”

    # Parse for a CRL entries
    for ($i=0; $i -lt $certlist.Count; $i++) {
    switch -Regex ($certlist[$i]) {
    “$marker CRL (\d{1,}) $marker” {if ($is_CRL) {$CRLList += , $CRL; $CRL=@{}; } $is_CRL = $true}
    “$marker Certificate (\d{1,}) $marker” {if ($is_CRL) {$CRLList += , $CRL; $CRL=@{}; } $is_CRL = $false}
    “CertUtil: -store command completed successfully.” {$CRLList += , $CRL; $CRL=@{}; “eof”}
    default {
    if ($is_CRL) {
    $string = $certlist[$i] -split ‘: ‘
    switch ($string[0]) {
    “Issuer” {$CRL.add(“Issuer”, $string[1])}
    ” ThisUpdate” {$CRL.add(“ThisUpdate”, [datetime]::ParseExact($string[1], $pattern, $null))}
    ” NextUpdate” {$CRL.add(“NextUpdate”, [datetime]::ParseExact($string[1], $pattern, $null))}
    “CRL Entries” {$CRL.add(“CRLEntries”, $string[1])}
    “CA Version” {$CRL.add(“CAVersion”, $string[1])}
    “CRL Number” {$CRL.add(“CRLNumber”, ($string[1] -split “=”)[1] -replace ” “)}
    “Delta CRL Indicator” {$CRL.add(“DeltaCRLNumber”, ($string[1] -split “=”)[1] -replace ” “)}
    “CRL Hash(sha1)” {$CRL.add(“CRLHash”, $string[1] -replace ” “)}
    }
    }
    }
    }

    }

    #Remove an expired CRLs
    $CRLList | % {
    if ($_.’NextUpdate’ -lt $now) {& certutil -delstore -enterprise CA $_.’CRLHash’}
    }

    # Install a fresh CRLs
    Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1.crl’ -OutFile ($workdir + “ca1.crl”)
    Invoke-WebRequest -Uri ‘http://ca.contoso.com/Contoso%20Issuing%20CA1+.crl’ -OutFile ($workdir + “ca1p.crl”)
    &certutil -addstore -enterprise -f CA ($workdir + “ca1.crl”)
    &certutil -addstore -enterprise -f CA ($workdir + “ca1p.crl”)

    Remove-Item ($workdir + “ca1.crl”) -Force
    Remove-Item ($workdir + “ca1p.crl”) -Force

    • Oleg Glushko 07/04/2017 / 05:07

      Also, it’s may be a good idea to add another check for a certificate’s Issuer, change this:
      if ($_.’NextUpdate’ -lt $now) {& certutil -delstore -enterprise CA $_.’CRLHash’}
      to this
      if (($_.’NextUpdate’ -lt $now) -and ($_.’Issuer’ -like ‘*Contoso*’)) {& certutil -delstore -enterprise CA $_.’CRLHash’}

      • LuisR 10/04/2017 / 11:26

        Thanks Oleg for providing an answer for the clean solution 😉

  4. Andreas B. 15/06/2017 / 10:36

    Anybody here who could solve this issue?
    If I try to import the CRL into local cert store, it works but the warning doesn’t go away and I can’t see the CRL anywhere in certificate MMC

    • Sergey A. 13/10/2017 / 14:07

      Hello! I imported CRL in Intermediate CA. Fabric found it in registry, but error 4097 still appear in event log. Should i restart SFB services after import CRL ? Any ideas?

      • LuisR 13/10/2017 / 14:11

        Should not be required at all or it would defeat the purpose of CRL publishing over time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s