Logging scenarios that crash and cripple the RTCCLSAGT

Another quest with an issue that appears to be there since some time of Skype for Business 2015 server (if not even Lync server. I decided to find out and document it.

ISSUE

Consider the following scenario:
1) You start the Skype for Business logging tool
2) Pick one of the following builtin scenarios (CmdletDebug or IISLog)
LoggingIIS-crash3) As soon as you start this scenarios on the selected servers you will notice the following error message on the logging tool output window: “ResponseMessage: Error code – 20000, Message – Unknown error – Error calling agent <FE-FQDN>; Could not connect to net.tcp://<FE-FQDN>:50002/. The connection attempt lasted for a time span of 00:00:02.0071248. TCP error code 10061: No connection could be made because the target machine actively refused it 10.101.128.20:50002. . Please refer CLS logs for details.”

RTCCLSAGT-eventerrorBy this time you will notice that the (Skype for Business Server Centralized Logging Service Agent) RTCCLSAGT service has crashed on all the servers you initiated the trace, with the following event error id 33040:
Centralized Logging Service Agent Error starting background thread to process traces.
Log Type – IISLogManager, Error – Object reference not set to an instance of an object.
Cause: Internal error
Resolution: Examine error details to determine resolution.

Worst, the service will crash every time you try to start it with the same error. You find yourself without any logging capability on your servers!

CAUSE

The reason is that the two scenario contain invalid trace components called ‘Internal’ and ‘External’. Somehow this triggers an internal failure on the IISLogManager component and damaged it.
When you edit those scenarios, you will notice a warning about the components
Unknown-traces

SOLUTION(s)

This is how you can fix the two problems:

  • Repair the failed service start (no server downtime required)
    Run the repair of the Skype for Business 2015, Core Components.
    S4B-needNetFramework
    Pay attention that two services need to be set to Automatic (Delayed Start) or it may fail on Windows 2012 Servers (as the above picture). Just noticed the instructions on my previous post.
  • Fix the scenarios to prevent from happening again:
    1) notice the valid components and flags.
    2) delete the scenario
    3) create a scenario with the same name and all the components (excluding the ‘External’ and ‘Internal’)
Advertisements

One thought on “Logging scenarios that crash and cripple the RTCCLSAGT

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s