Call Quality Dashboard – Part 2: The CUBE

After describing the Call Quality Dashboard (CQD) QoE Archiving Database on part 1, I will show now how to install the CUBE component and how it works on the solution.

The CUBE is “where data from QoE Archive database is aggregated for optimized and fast access” by the Portal component: this is the ‘data crusher’
ic841926

The CUBE is a SQL Server Analysis Service (SSAS) or generically known as an online analytical processing (OLAP).

Installing CQD – QoE CUBE

Before performing the installation the following pre-requisites need to be in place:

  • You need a server with SQL Server Analysis Services (SSAS) installed. The following picture  (all-in-one example) shows the required SQL components for CQD installationsic797717
  • It’s recommend to create a dedicated domain service account to grant the least required privilege to it. This account is used to trigger the cube processing.
  • The QoE Archiving Database needs to be already deployed.
  • You need to run the installation on the SQL server where the QoE Archive Database was installed. This is because some files will be installed and used by the SQL Agent.

The installation package is the same for all CQD components so, if: (a) you are installing all components you can go to step 2; (b) if you already installed the QoEArchiving on the same server, go to ‘programs and features’ and ‘change’ the package and proceed to step 2:

  1. Proceed throw the welcome screen, licence agreement, and choose the binaries install location

  2. For this part I will select the QoE CUBE and proceed to the configurations screen

    Configurations options:
    sqlname-vs-instance• QoE Archive SQL Server Instance: SQL Server instance name for where the QoE Archive DB is located. To specify a default SQL Server instance, leave this field blank. To specify a named SQL Server instance, enter the instance name
    • Cube Analysis Server: SSAS server and instance name for where the cube is to be created. This can be a different machine but the installing user has to be a member of Server administrators of the target SSAS instance.
    • Use Multiple Partitions: ‘Multiple partitions’ requires Business Intelligence edition or Enterprise edition of SQL Server. ‘Single Partition’ only requires for a Standard edition, but cube processing performance may be impacted.
    • Cube User – User Name & Password: Domain service account that will trigger the cube processing.

  3. After the validations the installation will ask to proceed until completion, hopefully without any error:)

    Behind the CQD QoE CUBE

    What happened and was configured after the previous installation steps?
    This component setup  installed some specific files, created a SSAS database and made some updates on the QoE Archiving Database:
    • QoECube database was created;
    • ‘Cube User’ login created and assigned db_datareader and db_datawriter on the QoEArchive
    • a credential created with the ‘Cube User’. This will be used to impersonate the connection to the QoECube to the source SSAS server.
    • A linked server source, mapping all the databases on the source SQL server
    • A 2nd step on the SQL Agent Job (created by the QoE Archive) and a proxy. This is the ‘brain’ that will trigger the cube.
    • The files used by the agent to trigger the cube

Known ‘caveats’ regarding the installation and architecture process:

  • The script command ‘process.bat’ to trigger the cube process overwrites the error log ‘process.log’ at every execution. Since the Agent execution is ran every 15 minutes you might not catch a cause/history of past errors.
    As quick workaround, you can change the script command to pipe and add (>>) the output to the existing log file:
    “%~1QoECubeService.exe” “%~1cubeModel.xml” >> “%~1process.log”
  • Don’t use a domain user account password starting with ‘+’. The setup SQL procedure will ignore it and then you will get the following on the SQL job and the cube trigger will not start:
    “Unable to start execution of step 1 (reason: Error authenticating proxy LAB\service.cube, system error: The user name or password is incorrect.).  The step failed.”

How to manage and monitor the CQD QoE CUBE process ?

The main CUBE processing is triggered using the same SQL Agent job created by the QoE Archiving. A second step is added to the job and whenever there is new data synchronized from the QoEMetrics to the QoeArchive, the job will launch a command script:
CQD-CUBE-SQLAgentExecution errors will be logged on the SQL agent log and details can be found on the file ‘process.log’ generated on the same folder as the command script.

Now you have a replica of your QoE data, a tool to process analyse it. You now need an interface to visualize and modulate described on part 3.

And finally…

There is a way to script the previous installation in one single command line (you just need to replace the orange text with your settings):

Msiexec /i “CallQualityDashboard.msi” ADDLOCAL=QoECube REBOOT=ReallySuppress CQD_INSTALLDIR=”D:\Skype4B\CQD” CUBE_ARCHIVE_SERVER=”LYNC-CQD.my.lab\CUBE” DISABLE_CUBE_MULTIPLE_PARTITION=”true” CUBE_ANALYSIS_SERVER=”LYNC-CQD.my.lab\CUBE” CUBE_USER=”LAB\service.cube” CUBE_PASSWORD=”WhoKnows?/qb!

  • You still need to run this it on the server holding the QoE Archiving database (it needs to install the agent script files)
  • Be sure to use lowercase ‘true’ or ‘false’ on the parameter.
    It will write ‘as is’ this value on the cubeModel.xml file, and the Agent job will fail and you will see an error on the ‘process.log’:
    Error while Processing: There was an error deserializing the object of type Microsoft.Rtc.Qoe.Cqd.QoECubeService.CubeProcessModel. The value ‘True’ cannot be parsed as the type ‘Boolean’.
    You can fix this by ‘lowercasing’ the value of the parameter <DisablePartitioning> on the cubeModel.xml

Call Quality Dashboard – Part 1: The QoE Archive Database

Overview of Call Quality Dashboard (CQD)

The QoE database is replicated to another SQL database (named the ‘QoeEArchive’) and then is manipulated throw a user web portal using a SQL Analysis Service (CUBE). The CQD is composed of 3 components: The QoE Archive DB, The Cube and the Portal.

You can read more information on the TechNet article: ‘Plan for Call Quality Dashboard for Skype for Business Server 2015’.
These article also components can be installed in one single server, or distributed up to 3 (I say that can go to 4) servers.

Inspired on this I decided to split the subject in three posts: how to install, and also how each element works. Besides it’s easier to read, it allows you to understand how to deploy on a multiserver or single server.

Installing CQD – QoE Archiving Database

As seen on the above picture, the QoE Archive is a database with some procedures that replicates the data from a Lync/Skype4B ‘QoEmetrics’ database.
What do you need as pre-requisites to install this:

  • A SQL database service (recommended a dedicated one)
    You need the Enterprise or Business Intelligence edition if you to use ‘multiple partitions’ which allow better CUBE processing performance for large amounts of data
  • The SQL agent service must be running (automatic startup) on that SQL server. An agent job will be running periodically to replicate data. If
  • An account with db_datareader role/permissions on the QoEmetrics database
    CQD-QoE-DBuser
    This account will also be granted db_owner on the QoEArchive and it will be impersonated (proxy) to connect to the QoEmetrics.
  • You must run the install package on the SQL server where you want to install the Archive database. The setup reads this info from the local system and doesn’t allow you to change ( using the GUI 😉 )

After downloading the CQD package, the setup process is the following:

  1. Proceed throw the welcome screen, and choose the binaries install location
  2. For this part I will just select the QoE Archive (deselect the others)

    Configurations options:
    sqlname-vs-instance• QoEMetrics SQL Server: SQL Server and instance name where the QoE Metrics database is located.
    • QoE Archive SQL Server Instance:the A local SQL Server instance name for where the Archive DB is to be created. Leave this field blank for a default SQL setup.
    • QoE Archive Database: create a new or use an existing one (useful for recovery/migration/connect new source scenarios -it will rebuild the ACL’s, connectors and jobs-)
    • Database File Directory: location where the new database files are to be created. Recommended a separate disk volume.
    • Use Multiple Partitions: ‘Multiple partition’ requires Business Intelligence edition or Enterprise edition of SQL Server. ‘Single Partition’ only requires for a Standard edition, but cube processing performance may be impacted.
    • Partition File Directory: (if using ‘Multiple partition’) Path to where the partitions for the QoE Archive database should be placed.
    • SQL Agent Job User – User Name & Password: Domain service account used to connect to the QoEmetrics database and replicate on the QoEArchive

  3. After the databases, instances and account access validation the installation will ask to proceed until completion, hopefully with any error 🙂
    CQD-setup-Ready_CQD-setup-ArchiveCompleted

Behind the CQD QoE Archive Database

What happened and was configured after the previous installation steps?
This component setup didn’t installed any specific binaries. The installation was in fact a series of configurations on the SQL server used for the CQD Archive database:

  • QoEArchive database was created
  • ‘SQL Agent Job User’ login created and assigned db_owner of the QoEArchive
  • a credential created with the ‘SQL Agent Job User’. This will be used to impersonate the connection to the QoEMetrics on the source SQL server
  • A linked server source, mapping all the databases on the source SQL server
  • A SQL Agent Job and proxy. This is the ‘heart’ that will sincronize the QoEMetrics and the QoEArchive

Known ‘caveats’ regarding the installation and architecture process:

  • Both database and transaction log files are going to be installed on the same folder. You can only change this after using SQL tools and procedures.
  • Not 100% sure about this (need to investigate this one), but I couldn’t find documented support for a QoEMetrics mirrored database.
    If the database fails to the other node the synchronization process fails.
  • Don’t use a domain user account password starting with ‘+’. The setup SQL procedure will ignore it and then you will get the following on the SQL job and the data will not get replicated:
    “Unable to start execution of step 1 (reason: Error authenticating proxy LAB\service.CQD, system error: The user name or password is incorrect.).  The step failed.”
    You can solve this by manually setting the correct password on the ‘QoEArchiveCredential’

How to manage and monitor the CQD QoE Archive process ?

As I told before the QoE Archive is a data synchronization process between the Lync/Skype4B QoEmetrics database and the QoEArchive.
This is done using a SQL agent job that runs, by default, every 15 minutes:
CQD-archive-agentjob

This ‘simple’ job triggers a series of store procedures and will sync the databases tables.
You can see the sync jobs status and errors on a particular table. If you open the tables on the QoEMetrics and QoEArchive, you will confirm that (the second one will have some more tables that are used to control the sync process:

I used the word ‘DB synchronize/replication’ to simplify the idea. In fact, it does what the name means: ‘collects data and add to the existing archive’. “CQD’s QoE Archive database provides a second copy of the QoE Metrics data with much longer retention capabilities”.

If you have multiple Skype4B pools, each with its own Monitoring Server, “CQD does not merge data from multiple QoEMetrics databases!”. “Each CQD instance must point to one QoEMetrics database!”.(*)
“However, because CQD will move much of the reporting workload off of the Monitoring Server, large organizations that deployed one Monitoring Server per Skype4B Pool topology should consider using one Monitoring Server for all topologies”.
But this can compromise using the Monitoring Reports tool to analyse (older) data in a different way and doesn’t handle the other bigger and heavier monitoring database: the LcsCDR. – This is an open topic for a future blog 🙂

You can monitor the replication process, not just by the Agent Job logs, but there is also a table that contains the history. For example the Agent job will report an error if there is no new data to replicate from and you can only see that here:
CQD-archive-logs.png

Now you have a replica of your QoE data to analyse with the tools described on part 2.

Wait ! eastern egg!

For those who manage to read until here without falling asleep, here’s a ‘gooddie’.
Here’s how can automate the previous setup from the command line (you just need to replace the orange text with your settings):

Msiexec /i CallQualityDashboard.msi ADDLOCAL=QoEArchive REBOOT=ReallySuppress CQD_INSTALLDIR=”D:\Skype4B\CQD” QOE_METRICS_SQL_SERVER=”LYNC-BE.my.lab\INST1” ARCHIVE_SQL_SERVER=”LYNC-CQD.my.lab\CUBE” INSTALL_NEW_ARCHIVE=True ARCHIVE_FILE_DIRECTORY=”E:\Databases\CQD” DISABLE_ARCHIVE_MULTIPLE_PARTITION=True ARCHIVE_SQL_AGENT_USER=”mydomain\cqdserviceaccount” ARCHIVE_SQL_AGENT_PASSWORD=”itsAsecret/qb!

The interesting part is that (for SQL standard/single partition deployments) you can run this setup command from another server that is not the CQD SQL database one (as long as you have the SQL client tools installed on the one you run).

 

Call Quality Dashboard: built-in reports

6460fig7The Call Quality Dashboard (CQD) is a new feature released at the same time as Skype for Business 2015 but that also works with Lync 2013. In simple words it gives you a visual overview of your QoE/monitoring data.

It doesn’t have nearly the feature set of paid products like EventZero (that Microsoft bought on Jan 2016) or IR Prognosis but, it’s free and certainly can insight the standard monitoring reports.

It’s a powerful application that allows you to create your own reports and it contains 44 built-in reports. This post is about sharing the hierarchical listing of those reports that you will find right after you finish the installation (I will start tomorrow posting about the CQD architecture, components and installation).

For now it’s just a dump of the headers and description but, as soon as I start getting some nice graphics, I will update this post.

1. Audio Streams Monthly Trend (Managed vs Unmanaged Audio Streams)

This Report shows the monthly audio streams count, poor count, and poor ratio for the last 7 months. There are no filters applied so the data is what is contained in the QoE Database. Audio calls made over wireless and external networks can cause poor call rates to go up. To find the root cause of the poor calls, drill into the data by clicking on the title of the report!

1.1. Managed Audio Streams Monthly Trend

The Managed bucket contains audio streams made by servers and clients on wired corporate network connections. Any poor streams seen here need investigation. Click the report title to drill down!

1.1.1. Server-Server

The Server-to-server Audio Streams Report provides a good baseline for your Managed network environment. The percentage of poor calls using the ClassifiedPoorCall measure is expected to be below 0.5%.

1.1.1.1. Server-Server Monthly Trend

This Report is a copy of the Parent Report and is included here as a reference. The Y-axis scale is fitted to the call volume for Wired-Wired-Inside calls so month-so-month changes are more visible here than in the Parent Report.

1.1.1.2. Server-Server Daily Trend

This Report shows the server-to-server audio streams by day. It has the same filter condition as the Monthly Trend Report.

1.1.1.3. Server-Server by Transport Type

Audio streams between servers should only use UDP. Any TCP streams are not expected and should be investigated. If there is a high percentage of poor TCP streams, it could explain the poor streams in the Server-Server scenario.

1.1.1.4. Server-Server by Server Type Pairs

This Report shows the Poor call distribution among the server user agent type combinations. Each combination represents a specific network path and server endpoint health. The Gateway server type can include SBC providers. Click the title to see a breakdown by GW endpoint names!

1.1.1.4.1. Mediation Server-Gateway Audio Streams

This Report is a copy of the Parent Report except is also includes a filter for just the Mediation Server-Gateway calls. It is included here as a reference.

1.1.1.4.1 Server-Server by Server Location City Pairs

If the servers are generally located in different cities, this Report can show potential network issues in the network path between different locations. The City column requires IT-supplied subnet IP-to-Network-to-City mapping data to be entered into the QoEArchive database.

1.1.2. Server-Wired-Inside

The Server-to-client-wired-inside Report is used to monitor the health of the network paths between the clients and servers.

1.1.2.1. Server-Wired-Inside Monthly Trend

This Report is a copy of the Parent Report and is included here as a reference. The Y-axis scale is fitted to the call volume for Wired-Wired-Inside calls so month-so-month changes are more visible here than in the Parent Report.

1.1.2.2. Server-Wired-Inside by Client Transport Type

Audio streams on the corporate intranet should only use UDP. Any TCP streams are not expected and should be investigated. If there is a high percentage of poor TCP streams, it could explain the poor streams in the Server-Wired-Inside scenario. Click the title of the report to drill down!

1.1.2.2.1. Server-Wired-Inside by Client Transport

This Report is a copy of the Parent Report and is included here as a reference.

1.1.2.2.2. Server-Wired-Inside (TCP) by Client Endpoint

This Report shows all the client endpoints that have reported TCP streams. The rows are sorted by Count of Good streams descending.

1.1.2.3. Server-Wired-Inside by Server Type  

This Report shows the server-to-client-wired-inside calls by Server Type. It can show problems due to server config that are not captured by the Server-Server Reports. Investigate servers that have higher poor call rates than others as well as servers that show sudden increase in poor call rates.

1.1.2.4. Server-Wired-Inside by Client Connectivity ICE 

Audio streams on the corporate intranet should only use UDP. Any TCP streams are not expected and should be investigated. If there is a high percentage of poor TCP streams, it could explain the poor streams in the Server-Wired-Inside scenario. Click the title of the report to drill down!

1.1.2.4.1. Server-Wired-Inside by Client Transport 

This Report is a copy of the Parent Report and is included here as a reference.

1.1.2.4.2. Server-Wired-Inside (TCP) by Client Endpoint 

This Report shows all the client endpoints that have reported TCP streams. The rows are sorted by Count of Good streams descending.

1.1.2.5. Server-Wired-Inside by Client Building  

If Subnet IP-to-Network and Building mappings are populated in the QoEArchive database, this Report will light up with the server-to-client-wired-inside call data broken down by the client endpoint’s Building Name. This is a very powerful way to compare Poor Call Rates for all buildings.

1.1.2.6. Server-Wired-Inside by Client Type  

This Report shows the server-to-client-wired-inside calls by Client User Agent Type. It can show problems due to QoS configuration since that can be applied based on client executable name.

1.1.2.7. Server-Wired-Inside by Client Network Type  

The Network Type is another IT-supplied data set that allows the network subnets to be tagged with IT specific context. For example: “LabNet”, “Wifi”, “Wired”, “DataCenter”, and “Vendor” are all possible classification values. This allows cross checking the IT-supplied values can be compared to client OS observed values for the Network Connection Detail.

1.1.3. Wired-Wired-Inside

The Wired-Inside-Client-to-Wired-Inside-Client Report is used to monitor the health of point-to-point calls that do not involve server endpoints. The network path that these calls take are usually different from server-client calls.

1.1.3.1. Wired-Wired-Inside Monthly Trend 

This Report is a copy of the Parent Report and is included here as a reference. The Y-axis scale is fitted to the call volume for Wired-Wired-Inside calls so month-so-month changes are more visible here than in the Parent Report.

1.1.3.2. Wired-Wired-Inside Daily Trend  

This Report is shows the daily trend of the count and poor call rate measures for the current month.

1.1.3.3. Wired-Wired-Inside (OCPhone-OCPhone) Daily Trend  

This Report shows just the subset of client-wired-inside-to-client-wired-inside calls where both endpoints are IP Phones. This should represent the best possible scenario for wired and inside calls. Poor call rates < 0.1% are not unexpected.

1.2. Unmanaged Audio Streams

The Unmanaged bucket contains audio streams made by clients on wireless networks, public networks, or home networks. Some amount of poor streams are expected. However, a worsening trend of poor call rates warrants investigation. Click the report title to drill down!

1.2.1. Server-Wifi-Inside

The Server-to-client-wifi-inside Report is used to monitor the health of the corporate wifi network.

1.2.1.1. Server-Wifi-Inside Monthly Trend

This Report is a copy of the Parent Report. It is included here as a reference.

1.2.1.2. Server-Wifi-Inside – Best Subnets

This Report shows call quality over enterprise wifi network for each client subnet IP address. If subnet ip address-to-network name mapping is entered in the QoEArchive database, then this report can be changed to group by client building name instead of subnet IP address.

1.2.1.3. Server-Wifi-Inside – Worst Subnets 

This Report is similar to the previous Report except it is sorted from worst Poor Call Percentage to best.

1.2.1.4. Server-Wifi-Inside by Client Wifi Chipset  

wifi chipset

1.2.2. Server-Wired-Outside  

The Server-to-client-wired-outside Report is used to monitor the health of the network path from the servers to the internet edge. Changes in Poor Call Rates month-to-month should be investigated.

1.2.3. Server-Wifi-Outside 

This Report is used as comparison to the Server-Wired-Outside Report.

1.2.4. Wired-Wired-Outside-DIRECT 

This Report shows poor call quality when 2 client endpoints are connected directly. It is used in conjunction with the Wired-Wired-Outside-RELAY report to identify any potential Media Relay Edge or datacenter edge issues.

1.2.5. Wired-Wired-Outside-RELAY 

This Report shows poor call quality when 2 client endpoints are connected through one or more Media Relay Edge servers. An increase in poor call percentage should be investigated.

1.2.5.1. Wired-Wired-Outside-RELAY 

This Report is a copy of the Parent Report. It is included here for reference.

1.2.5.2. Wired-Wired-Outside-Relay By Relay IP Address 

This Report shows the client-outside-wired-to-client-outside-wired calls that used one or more Media Relay Edge Servers. The data is broken down by one client’s Relay Server IP Address. There could be more than one Relay in the call but pivoting on just one can give a sampling of the relative call quality across the Relay servers. This Report also demonstrates the use of browser-side filtering of the results to remove any rows that do not contain more than one good stream.

1.2.6. Wired-Wired-Outside-Other 

This Report shows poor call quality when 2 client endpoints are connected not directly or by a relay. It is used in conjunction with the Wired-Wired-Outside-RELAY, Wired-Wired-Outside-DIRECT reports to identify any potential Media Relay Edge or datacenter edge issues.

1.2.7. Other Unmanaged Calls 

This Report captures the Unmanaged audio streams that do not belong to any of the other Unmanaged Scenarios. For example, Wifi-Wifi calls would be represented in the Report.

1.3. Other (Invalid Report)

The Other bucket contains audio streams that cannot be classified as Managed or Unmanaged. Classification of streams into Managed or Unmanaged requires the network connection type and access location and the data must be reliable. Endpoints that do not send QoE reports will be classified into the Other bucket. The StreamType.StreamType dimension has a value of ‘false’ if the stream cannot be classified.

1.3.1. Other (Invalid Report)  

This Report is a copy of the Parent Report.

1.3.2. Other (Invalid Report) by User Agent Types 

This Report contains Server-to-client calls grouped by the client User Agent Type.

2. User-reported Call Quality Rating Histogram

This Report shows the count of each of the possible User-collected rating. The possible values are 1 – 5 with 5 being the best and 1 being the worst. The rating values are only collected via Skype for Business Clients.

2.1. User-reported Call Quality Rating Monthly Trend 

This Report shows a monthly trend of the count of each of the possible User-collected rating. The possible values are 1 – 5 with 5 being the best and 1 being the worst. The rating values are only collected via Skype for Business Clients.

 

Taking control of the rtcReplicaRoot folder

xds-replica-wrongWhen you use the setup (or migration) assistant, you know that you cannot control several installation locations, like the databases and specially the xds-replica folder.

I learned myself, since Lync 2010, how to control  the install location by performing a manual setup of part of the components (see this post at step 10).
But if you are performing a Skype4B inplace upgrade, the assistant will use remove the previous version of the replica service and install the new using the default logic.
If you have (like me) multiple volumes on your Windows server you might have this folder where you don’t want it (like in a dedicated pagefile or SQL data volumes).

If you didn’t find the logic of this install location, here’s the only MS documentation reference note about it:
During the upgrade process the xds-replica is placed in the local shared folder on the disk drive with the most free space. If that disk is later removed then you can run into issues such as services not starting.

Let’s skip the discussion of why you need the emptiest volume for a small size directory structure and concentrate on the main issue:

How can I move the rtcReplicaRoot folder?

You can google-foo and find some references (here and here) how can you manually tweak the folder, shares, acl’s and do some registry changes.
But this has some inconvenients: the uninstallation of the component will probably fail/generate errors. This will complicate an upgrade/patching process and requires you again, to manually fixed it.

Using the ocscore.msi setup package is also a big challenge:
• the REPLICA agent service is inside the ‘Lync/Skype4B core components’. if use the ‘programs/features’ to uninstall them (if it allows) it will break all the other components;
• if you manage to find out the specific uninstall switch, it will -by default- drop the local XDS database (and loose the local topology reference and the local certificates in use;
• also a new installation can overwrite the existing XDS database with an empty one.

By using the undocumented setup switches, you can effectively remove and control the setup of the rctReplicaRoot on a specific folder. This procedure has 3 great advantages:
•  it’s a standard MSI supported installation – no disruption for patching or upgrades;
•  doesn’t require to apply the latest patches, since it uses the local server MSI cache;
•  It can be done without stopping the main Lync/Skype4B services 🙂

Skype for Business Server 2015

The process was greatly simplified by including two tiny switches to allow future upgrades (unlike previous versions of Lync):

  1. stop the related services (via powershell)
    Stop-CsWindowsService REPLICA
    Stop-CsWindowsService RTCCLSAGT
  2. Uninstall the related component services
    MsiExec.exe /i {DE39F60A-D57F-48F5-A2BD-8BA3FE794E1F} KEEPDB=1 REMOVE=Feature_LocalMgmtStore REBOOT=ReallySuppress /qb!
    This will remove all the related service components, rtcreplicaroot folder, share and ACL’s
  3. Install the component services
    Msiexec /i {DE39F60A-D57F-48F5-A2BD-8BA3FE794E1F} ADDLOCAL=Feature_LocalMgmtStore SKIP_DB=1 REPLICA_ROOT_DIR=”[fullpathto_rtcreplica_folder]” REBOOT=ReallySuppress /qb!This will install all the related service components, the rtcreplicaroot folder on the desired location, create the share and set ACL’s and registry entries.
  4. Enable the local replica service (via powershell)
    Enable-CsReplica
  5. start the related services (via powershell)
    Start-CsWindowsService REPLICA
    Start-CsWindowsService RTCCLSAGT

Lync Server 2013

The setup package was not designed for this particular task:
• The install will overwrite any existing XDS with a new/empty one
• The uninstall will drop/delete existing XDS

In fact that Skype for Business inplace upgrade assistant was designed to handle especially this situation, by using the existing utility (InstallCsDatabase) that manage the local databases:

  1. stop the related services (via powershell)
    Stop-CsWindowsService REPLICA
    Stop-CsWindowsService RTCCLSAGT
  2. Detach the XDS database (to avoid the uninstall from deleting it)
    “%CommonProgramFiles%\Microsoft Lync Server 2013\DbSetup\InstallCsDatabase.exe” /Detach /Feature:CentralMgmtStore
  3. Copy the database files (xds.mdf and xds.ldf) to a safe location
  4. Uninstall the related component services (elevated command prompt rights)
    MsiExec.exe /i {8901ADFC-435C-4E37-9045-9E2E7A613285}  REMOVE=Feature_LocalMgmtStore REBOOT=ReallySuppress /qb!
    This will remove all the related service components, rtcreplicaroot folder, share and ACL’s
  5. Install the component services  (elevated command prompt rights)
    Msiexec /i {8901ADFC-435C-4E37-9045-9E2E7A613285} ADDLOCAL=Feature_LocalMgmtStore REPLICA_ROOT_DIR=”[fullpathto_rtcreplica_folder]” REBOOT=ReallySuppress /qb!This will install all the related service components, the rtcreplicaroot folder on the desired location, create the share and set ACL’s and registry entries.
  6. Drop the empty XDS database (created on step 5)
    “%CommonProgramFiles%\Microsoft Lync Server 2013\DbSetup\InstallCsDatabase.exe” /Drop /Feature:CentralMgmtStore
  7. Copy back the database files (xds.mdf and xds.ldf) from step 3
  8. Attach the XDS database
    “%CommonProgramFiles%\Microsoft Lync Server 2013\DbSetup\InstallCsDatabase.exe” /Attach /Feature:CentralMgmtStore
  9. Enable the local replica service (via powershell)
    Enable-CsReplica
  10. Start the related services (via powershell)
    Start-CsWindowsService REPLICA
    Start-CsWindowsService RTCCLSAGT

Notes about these commands and procedures

  • The uninstall will prompt you with a warning regarding active core components services. You can safely confirm this action has the main core components are kept.
  • You need to run the Msiexec and InstallCsDatabase with an elevated command prompt
  • InstallCsDatabase is case sensitive on some the parameters (/Feature:)
  • Feature_LocalMgmtStore – is the feature name identifier inside the ocscore.msi package
  • KEEPDB=1 will prevent the uninstall to drop the XDS database
  • SKIP_DB=1 will prevent the setup to overwrite and use any existing XDS database
  • REPLICA_ROOT_DIR will tell the setup it will create the  ‘xds-replica’ folder inside the define path (I usually use a subdirectory inside the installation of Skype4B)
  • You can use the PS commands to check if the local replica service is working properly (UptoDate=true)
    Invoke-CsManagementStoreReplication -ReplicaFqdn <your FE server FQDN>
    Get-CsManagementStoreReplicationStatus -ReplicaFqdn <your FE server FQDN>

Congratulations !

You now have control of your xds-replica rtcReplicaroot folder 🙂

Continue reading

S4B Front-end servers event 4097 flooding

After several installations and Skype for Business 2015 (S4B) Server upgrades, a colleague of mine pushed my attention to a large ammount of event id 4097 warnings on the Administrative Events view related to Windows Fabric. The rate if this flood could be 3-5 events every 5-15 minutes:
– “cert chain trust status is in error: 0x1000040”
– “ignore error 0x80092013:certificate revocation list offline”
You can check all those events on the specific Windows Fabric log shown on the below picture.

Question: Why and where do this event come from?

S4B installs Windows Fabric 3.0 and added, between others, additonal securitys setting in the fabric intracluster communications. You can find this settings on several configuration files:
%PROGRAMDATA%\Windows Fabric\FabricHostSettings.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\ClusterManifest.current.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Data\InfrastructureManifest.xml”
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Config.<version>\Settings.xml”

The relevant settings related to the events can be found on the ‘Security’ section:

  <Section Name=”Security”>
….
    <Parameter Name=”SessionExpiration” Value=”28800″ />
    <Parameter Name=”IgnoreCrlOfflineError” Value=”true” />
    <Parameter Name=”CrlCheckingFlag” Value=”3221225476″ />
   ….
  </Section>

The IgnoreCrlOfflineError is self-explanatory: If Windows Fabric encounters this error, ignore it and continue operations. For the CrlCheckingFlag values we need to look a little ‘deeper’. The Windows Fabric configuration files are generated by the Front-End server (RTCSRV / RtcHost module) at service startup based on this template file:
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template

On this file you will find a little more about the flags value:
      <!–
        CrlCheckingFlag setting follows the rest of the Lync Server components (sipstack, web) which  set the following flags:
               CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL=0x00000004 |  // do not go on the wire for cert retrieval
               CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY=0x80000000 |  // do not go on the wire for cert revocation check
               CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT=0x40000000
                                                              0xC0000004=3221225476 (unsigned int)
      –>
      <Parameter Name=”CrlCheckingFlag” Value=”%CRLCHECKINGFLAG%” />

More information regardind the flags can be found on MSDN CertGetCertificateChain  function.
Basically, S4B services (including the Fabric) are configured to check certificates revocation (CRL) using local cache only, as the comments on the template says: do not go on the network to retrieve and check the CRL.

So… how does it check for a certificate revocation?
Running a trace on the Fabric process, we can see that it never tries to connect to a CRL distribution server but, a little before the events are logged, it reads the registry keys related to certificates.

Since there are no cached or local CRL available, it will report an CRL offline error, ignore it, but still write it on the event log.

Do the other Skype for Business services report this error? No, it’s hidden. They cause the same issue but, to see it, you need to enable CAPI2 logging:

Question: What can we do about this ?

I can see at least 3 options:

Option A (The ‘no worries’ one) – Ignore it!
Difficult level: noob
Advantages: Nothing to be done. It’s a normal S4B operation and nothing is affected
Disadvantages: Your administrative events view will be full of these events which makes difficult to find important warning

Option B (The ‘clean’ one) – provide the CRL to the local server
Difficult level: professional services
Advantages: You will not just stop seeing the Fabric events, but it will also optimize all the other S4B services
Disadvantages: Requires knowledge and script programming, since you want to schedule a task to get the CRL and put it on the machine certificate local store.

This option also proves how the CRL caching / ignoreofflineCRL works.
Instructions:
1. Find the certificate name used by Windows Fabric. You can get this from the Fabric configuration files (spoiler alert: it’s the same used by Lync internal services)
2. Find on the certificate the Distribution paths where to get the CRL

3. Download the CRL and install it on the Trust Root CA or Intermediate CA store depending if the certificate was issue by a Root or a Subordinate CA

As soon as you upload the CRL to the local computer certificate store, you will see that the Fabric process will read the CRL from the store.

… and no more 4097 events !

As you noticed, the CRL are updated regularly (daily or weekly) so they have an expiration and need to be retrieved. This means that doing this manually is a error-prune, time-comsuming task, which you can solve with a scheduled script (not included on this blog… yet!)

Option C (The ‘tweak’ one) – manipulate the CrlCheckingFlag
Difficult level: moderate: just need to know what you are doing
Advantages: It’s easiest way to stop seeing the Fabric events
Disadvantages: It’s a S4B undocumented parameter customization (altough I don’t believe MS will not refuse support if they found out). It can also be overwritten by an update.

This ‘quick fix’ option is about manipulating the file
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template’

Intructions:
1. Change the parameter of the ClusterManifests.Xml.Template
<Parameter Name=”CrlCheckingFlag” Value=”0″ />
2. Stop the RTCSRV and FabricHostSvc services
3. Start the RTCSRV service – it will create new Fabric configuration files based on the template file and also start the FabricHostSvc

Actually, according to this blog, this parameter value is required if you issue certificates withouth any CRL Distribution Point information…. or your S4B pool will not start at all (!)
Newly installed Skype for Business Front-End Pool refuses to start

 Final notes:

  • If you want to know more about Windows Fabric and S4B we have a great and detailed explanation on this blog

iOS S4B clients don’t show some Distribution (expansion) Groups

This is somehow a ‘sequel’ of the previous post: ‘Windows S4B clients don’t show some Distribution Groups‘ but for iOS based clients.

ISSUE #1 (and #2)

We have again the same Distribution Groups/Lists (DL’s) configured on our Destkop client (picture on the left), but when you sign-in with the iOS S4B (picture on the right) client some DL’s might not appear:
S4B-dekstop-allS4B-ios-missing1

CAUSE #1

Would could just conclude that it would be the same reason as the previous WP issue, but as you can see the SMTP list (doesn’t have a displayname on AD)  is being displayed while the one is missing is the DL’s with the Displayname doesn’t appear.
In this case you might not even consider an issue. The reason is simple as: if the DL’s don’t contain any members, the S4B client will not show it.

S4B-dekstop-all-expanded

There is really no explicit entry on the client log file. The only evidence is that the GET https://<webservicesexternalurl>/…/expand will return the members of the DL (resource rel=”contact”):

HTTP/1.1 200 OK
Cache-Control: no-cache
Via: 1.1 dc1.mylab.local RtcExt
Content-Length: 1557
Content-Type: application/vnd.microsoft.com.ucwa+xml; charset=utf-8

<?xml version=”1.0″ encoding=”utf-8″?>
  <resource rel=”distributionGroup” href=”/ucwa/v1/applications/212440241480/people/groups/Test.DL2@mylab.local/expand” xmlns=”http://schemas.microsoft.com/rtc/2012/03/ucwa”&gt;
  <property name=”uri”>Test.DL2@mylab.local</property>
  <property name=”id”>Test.DL2@mylab.local</property>
  <property name=”name”></property>
  <resource rel=”contact” href=”/ucwa/v1/applications/212440241480/people/test.user1@mylab.local”>
      <link rel=”contactPhoto” href=”/ucwa/v1/applications/212440241480/photos/test.user1@mylab.local” />
      <link rel=”contactPresence” href=”/ucwa/v1/applications/212440241480/people/test.user1@mylab.local/presence” />

      <property name=”mobilePhoneNumber”>+11111111</property>
      <property name=”type”>User</property>
      <property name=”name”>Test User1</property>
      <property name=”etag”>465437542</property>
   </resource>
</resource>

S4B-desktop-fixedSolution #1 (and #2)

If you want to see the DLs on the iOS client, you just need to:

  1. Populate the AD distribution group with one or more member;
  2. Wait for AD replication
  3. Exit and reopen the Desktop client.
    This will force the membership to refresh the DL (picture on the right)
  4. And the iOS client will start showing the group, right ?…
    NOPE! :even if you restart the entire device,
    remove and re-add  the DL
    …. the S4B client will not show it!

This will take us to a (new post?) issue #2: client user caching.
For now here’s two quick ‘workarounds’ preview solve it:

  • Option A: Sign-in with another user and sign-in back with your user
  • Option B: remove and reinstall the client on the iOS device

Finally, I have the ‘TEST DL1’ on my iOS client: S4B-desktop-fixed

FINAL NOTES

This behaviour was observed:

  • With the current iOS/S4b version at the time:
    Skype for Business v6.5.0.177
    running on iOS 9.3.2 (13F69)
  • It happens with both Lync 2013 and Skype for Business 2015 server infrastructure
    (so it’s not an ‘upgrade caveat’)

Windows S4B clients don’t show some Distribution (expansion) Groups

ISSUE

After migrating your Lync 2013 infrastructure to Skype for Business 2015 some of your users might complain that some Distribution Groups (Distribution Lists)(DLs) might not appear on their Windows Phone (WP) mobile clients.
The DL’s are Active Directory (AD) group objects designed for email applications:
AD-DistributionGroups

You can only add them to your Lync personal contact list using the Desktop Client:
desktopclient-contacts

After you add them, the contact list is automatically updated on any user logged on devices. But on a Skype for Business infrastructure some groups on your personal list might no appear on WP client (the same client connected to the Lync Server on the left and to S4B server on the rigth):
WPclient-Lync-showingGroupWPclient-S4B-missingGroup

CAUSE

The root cause is that, when you use the AD snap-in to create the groups in a standard way the field ‘Displayname’ is empty. The Desktop client:
– If there is a DisplayName on AD, the client will save and display it on the contact list
– If not, it will use the SMTP address (and save)
This behaviour happens when adding a Group and it’s updated everytime you sign-in with the client. You can see that on the client desktop picture above with the groups TEST.DL1 and TEST.DL2.

How does this impacts with Skype for Business Server and the WP client?

(A) Looks like there’s a change between Lync and Skype for Business external webservice behaviour and the way the WP handles the Group names.
Here’s the traffic capture comparison:
>> The Lync server send all the group information data
DL-WP-Lync-request

>> But the Skype for Business Server will is sending one particular field empty
DL-WP-S4B-request

(B) With the property ‘name’ empty the WP will not include the Group on the contact list to the display list, as captured on the WP client logs:
WPclient-DG-trace.png

SOLUTION

As described before, the AD snap-in doesn’t show you the DisplayName to fill in immediatly (like you do for a user account). You need to ‘show advanced options’ on the main console menu to view the ‘attribute editor’ tab and put the DisplayName value:
DGroup-displayname-fixup

After that, sign-in with the Desktop client. It will update the Group name and a view seconds later it will appear on the Windows Phone client:
desktopclient-contacts2WP-client-S4B-fixedGroup

FINAL NOTES

  • This is more likely to happen when you don’t have an Exchange/Email Server, or at least not, on the same AD as Skype for Business (since distribution lists are created on Exchange console and you define the DisplayName).
    Most probably the Office 365 AD connector takes care of this for those who have S4B on premise scenarios and use Exchange online
  • This was found on a Lync/S4B upgrade scenario, but the issue is: if you have S4B infrastructure and you create DLs manually and didn’t wrote the Group ‘DisplayName’
  • This behaviour was observed with the latest WP client version at the time
    version 6.3.1558.0
  • The iOS (iPhone, iPad and iPod) client doesn’t behave like this. If the property name is missing, it will display the group with the SMTP address
    But for these ones:
     they also have a ‘not displaying DLs’ issue condition – discussed on my next post 😉 and the ‘SMTP like’
    – if you add/refresh some other DL’s you might see that the SMTP like displayname will be empty !
    S4B-iOS-noDLname