IT based Communications

a different Unified Communications site

Call Quality Dashboard – Part 3: The Portal

After describing the Call Quality Dashboard (CQD) QoE Archiving Database and the QoE CUBE, I will show now how to install the Portal component and how it works on the solution.

The CQD Portal is “where users can easily query and visualize QoE data.” synchronized by the Archive and processed by the CUBE.
ic841926The CQD Portal is a IIS based web application that allows you not just visualized but create new reports, views and assign permissions to them. As the above picture shows, it relies on a SQL database to keep all the information.

Installing CQD – Portal

Before performing the installation, the following pre-requisites need to be in place:

  • You need a SQL Databases Services (dedicated or existing) for the setup to install the Portal support database.
  • On the server that will host the Portal you need to install IIS. The following powershell command will install all the required components:
    Add-WindowsFeature Web-Server, Web-Static-Content, Web-Default-Doc, Web-Asp-Net, Web-Asp-Net45, Web-Net-Ext, Web-Net-Ext45, Web-ISAPI-Ext, Web-ISAPI-Filter, Web-Http-Logging, Web-Url-Auth, Web-Windows-Auth, Web-Mgmt-Console  -verbose
  • A dedicated domain service account is recommended to can grant the least required privileges. If you installed all the components on the same server you can use the local built-in server account but, if have the SQL Database/Analysis services (CUBE) deployed on a different  servers, the account is required.
  • The QoE Archiving and the CUBE needs to be already deployed.

The installation package is the same for all CQD components so, if: (a) you are installing all components you can go to step 2; (b) if you already installed the QoE Archiving and/or the CUBE on the same server, go to ‘programs and features’ and ‘change’ the package and proceed to step 2:

  1. Proceed throw the welcome screen, licence agreement, and choose the binaries install location:
  2. For this part, I will select the Portal and proceed to the configurations screen:

    Configuration options:
    sqlname-vs-instance QoE Archive SQL Server: SQL Server instance name for where the QoE Archive database is located.
    Cube Analysis Server: SQL Server Analysis Service instance name for where the cube is located.
    Repository SQL Server: SQL Server instance name where the Repository database is to be created.
    IIS App Pool User – User Name & Password: The account that the IIS application pool should execute under and access the other components. You can choose one of the local server services account, otherwise choose ‘Other’ and provide a domain service account credentials (see pre-requisites above explanation).

  3. After the validations the installation will ask to proceed until completion, hopefully without any error :)

Behind the CQD Portal

What happened and was configured after the previous installation steps?
This component setup  installed some specific files, created support database and made some updates on the QoE CUBE Database:
• QoERepositoryDb database was created. This database holds the portal all the configurations, customized reports, …
• ‘IIS App Pool User’ login created and assigned db_owner on the QoERepositoryDb
• ‘IIS App Pool User’ login created and assigned db_datareader on the QoEArchive database
• ‘IIS App Pool User’ added to the QoERole on the CUBE database
• IIS default web site configured with 3 folders that matches the directories and files installed.

Known ‘caveats’ regarding the installation and architecture process:

  • In rare cases, the installer fails to create the correct settings in IIS. Manual change is required to allow users to log into the CQD. If users are having trouble logging in, please follow the steps described on ‘know issues’ section of the  TechNet article.
  • Cube Sync Fails – QoEMetrics may contain some invalid records based on end user clocks. If the time skew is greater than 60 yrs, the cube import will fail. Check the Min and Max StartTime/EndTime using the selections below. Look for and delete records in the far past and very distant future, they can be disregarded and they will break up the sync processes.
    Select MIN(StartTime) FROM CqdPartitionedStreamView
    Select MAX(StartTime) FROM CqdPartitionedStreamView
    Select MIN(EndTime) FROM CqdPartitionedStreamView
    Select MAX(EndTime) FROM CqdPartitionedStreamView
  • After deploying the CQD on a new server, you can run into a problem where the Portal was not showing any data and returned a problem saying:
    We couldn’t perform the query while running it on the Cube. Use the Query Editor to modify the query and fix any issues. Also make sure that the Cube is accessible
    In order to solve it, process the CUBE object and make sure it’s accessible as described here.

How to manage and monitor the CQD Portal process

The main portal page is accessible via http://<portalserverFQDN>/CQD.
CQD-Portal-main.png

You probably will not see any data because “when the installer is done, most likely the SQL Server Agent job will be in progress, doing the initial load of the QoE data and the cube processing. Depending on the amount of data in QoE, the portal will not have data available for viewing yet.” To check on the status of the data load and cube processing, go to http://<portalserverFQDN>/CQD/#/Health.
CQD-Portal-health
Or (like my LAB) you don’t have any monitoring data to display:). After that you should see the last successful and failed update status:
CQD-Portal-health-ok

Other configurations that you can perform on the Portal are described on the Deploy CQD TechNet article:

  • Post-install tasks required to have reporting data regarding locations (buildings, networks name, subnets, BSSID)
  • By default, any authenticated user has access. This can be changed by using IIS Authorization rules to restrict to a specific.
  • Detailed log messages will be shown if debug mode is enabled. To enable debug mode, go to [CQD installed Dir]\QoEDataService\web.config, and update the following line so the value is set to True:
    <add key=”QoEDataLib.DebugMode” value=”True” />

And that’s it! you now have CQD fully deployed!
You can now see how the Lync/Skype4b is performing, and even build you own reports. Creating them is tricky, but you can learn some basics here.

<Am I missing something? maybe some more posts about it. provide me some feedback suggestions/requests ;) >

Call Quality Dashboard – Part 2: The CUBE

After describing the Call Quality Dashboard (CQD) QoE Archiving Database on part 1, I will show now how to install the CUBE component and how it works on the solution.

The CUBE is “where data from QoE Archive database is aggregated for optimized and fast access” by the Portal component: this is the ‘data crusher’
ic841926

The CUBE is a SQL Server Analysis Service (SSAS) or generically known as an online analytical processing (OLAP).

Installing CQD – QoE CUBE

Before performing the installation the following pre-requisites need to be in place:

  • You need a server with SQL Server Analysis Services (SSAS) installed. The following picture  (all-in-one example) shows the required SQL components for CQD installationsic797717
  • It’s recommend to create a dedicated domain service account to grant the least required privilege to it. This account is used to trigger the cube processing.
  • The QoE Archiving Database needs to be already deployed.
  • You need to run the installation on the SQL server where the QoE Archive Database was installed. This is because some files will be installed and used by the SQL Agent.

The installation package is the same for all CQD components so, if: (a) you are installing all components you can go to step 2; (b) if you already installed the QoEArchiving on the same server, go to ‘programs and features’ and ‘change’ the package and proceed to step 2:

  1. Proceed throw the welcome screen, licence agreement, and choose the binaries install location

  2. For this part I will select the QoE CUBE and proceed to the configurations screen

    Configurations options:
    sqlname-vs-instance• QoE Archive SQL Server Instance: SQL Server instance name for where the QoE Archive DB is located. To specify a default SQL Server instance, leave this field blank. To specify a named SQL Server instance, enter the instance name
    • Cube Analysis Server: SSAS server and instance name for where the cube is to be created. This can be a different machine but the installing user has to be a member of Server administrators of the target SSAS instance.
    • Use Multiple Partitions: ‘Multiple partitions’ requires Business Intelligence edition or Enterprise edition of SQL Server. ‘Single Partition’ only requires for a Standard edition, but cube processing performance may be impacted.
    • Cube User – User Name & Password: Domain service account that will trigger the cube processing.

  3. After the validations the installation will ask to proceed until completion, hopefully without any error:)

    Behind the CQD QoE CUBE

    What happened and was configured after the previous installation steps?
    This component setup  installed some specific files, created a SSAS database and made some updates on the QoE Archiving Database:
    • QoECube database was created;
    • ‘Cube User’ login created and assigned db_datareader and db_datawriter on the QoEArchive
    • a credential created with the ‘Cube User’. This will be used to impersonate the connection to the QoECube to the source SSAS server.
    • A linked server source, mapping all the databases on the source SQL server
    • A 2nd step on the SQL Agent Job (created by the QoE Archive) and a proxy. This is the ‘brain’ that will trigger the cube.
    • The files used by the agent to trigger the cube

Known ‘caveats’ regarding the installation and architecture process:

  • The script command ‘process.bat’ to trigger the cube process overwrites the error log ‘process.log’ at every execution. Since the Agent execution is ran every 15 minutes you might not catch a cause/history of past errors.
    As quick workaround, you can change the script command to pipe and add (>>) the output to the existing log file:
    “%~1QoECubeService.exe” “%~1cubeModel.xml” >> “%~1process.log”
  • Don’t use a domain user account password starting with ‘+’. The setup SQL procedure will ignore it and then you will get the following on the SQL job and the cube trigger will not start:
    “Unable to start execution of step 1 (reason: Error authenticating proxy LAB\service.cube, system error: The user name or password is incorrect.).  The step failed.”

How to manage and monitor the CQD QoE CUBE process ?

The main CUBE processing is triggered using the same SQL Agent job created by the QoE Archiving. A second step is added to the job and whenever there is new data synchronized from the QoEMetrics to the QoeArchive, the job will launch a command script:
CQD-CUBE-SQLAgentExecution errors will be logged on the SQL agent log and details can be found on the file ‘process.log’ generated on the same folder as the command script.

Now you have a replica of your QoE data, a tool to process analyse it. You now need an interface to visualize and modulate described on part 3 (to be published tomorrow).

And finally…

There is a way to script the previous installation in one single command line (you just need to replace the orange text with your settings):

Msiexec /i “CallQualityDashboard.msi” ADDLOCAL=QoECube REBOOT=ReallySuppress CQD_INSTALLDIR=”D:\Skype4B\CQD” CUBE_ARCHIVE_SERVER=”LYNC-CQD.my.lab\CUBE” DISABLE_CUBE_MULTIPLE_PARTITION=”true” CUBE_ANALYSIS_SERVER=”LYNC-CQD.my.lab\CUBE” CUBE_USER=”LAB\service.cube” CUBE_PASSWORD=”WhoKnows?/qb!

  • You still need to run this it on the server holding the QoE Archiving database (it needs to install the agent script files)
  • Be sure to use lowercase ‘true’ or ‘false’ on the parameter.
    It will write ‘as is’ this value on the cubeModel.xml file, and the Agent job will fail and you will see an error on the ‘process.log’:
    Error while Processing: There was an error deserializing the object of type Microsoft.Rtc.Qoe.Cqd.QoECubeService.CubeProcessModel. The value ‘True’ cannot be parsed as the type ‘Boolean’.
    You can fix this by ‘lowercasing’ the value of the parameter <DisablePartitioning> on the cubeModel.xml

Call Quality Dashboard – Part 1: The QoE Archive Database

Overview of Call Quality Dashboard (CQD)

The QoE database is replicated to another SQL database (named the ‘QoeEArchive’) and then is manipulated throw a user web portal using a SQL Analysis Service (CUBE). The CQD is composed of 3 components: The QoE Archive DB, The Cube and the Portal.

You can read more information on the TechNet article: ‘Plan for Call Quality Dashboard for Skype for Business Server 2015’.
These article also components can be installed in one single server, or distributed up to 3 (I say that can go to 4) servers.

Inspired on this I decided to split the subject in three posts: how to install, and also how each element works. Besides it’s easier to read, it allows you to understand how to deploy on a multiserver or single server.

Installing CQD – QoE Archiving Database

As seen on the above picture, the QoE Archive is a database with some procedures that replicates the data from a Lync/Skype4B ‘QoEmetrics’ database.
What do you need as pre-requisites to install this:

  • A SQL database service (recommended a dedicated one)
    You need the Enterprise or Business Intelligence edition if you to use ‘multiple partitions’ which allow better CUBE processing performance for large amounts of data
  • The SQL agent service must be running (automatic startup) on that SQL server. An agent job will be running periodically to replicate data. If
  • An account with db_datareader role/permissions on the QoEmetrics database
    CQD-QoE-DBuser
    This account will also be granted db_owner on the QoEArchive and it will be impersonated (proxy) to connect to the QoEmetrics.
  • You must run the install package on the SQL server where you want to install the Archive database. The setup reads this info from the local system and doesn’t allow you to change ( using the GUI😉 )

After downloading the CQD package, the setup process is the following:

  1. Proceed throw the welcome screen, and choose the binaries install location
  2. For this part I will just select the QoE Archive (deselect the others)

    Configurations options:
    sqlname-vs-instance• QoEMetrics SQL Server: SQL Server and instance name where the QoE Metrics database is located.
    • QoE Archive SQL Server Instance:the A local SQL Server instance name for where the Archive DB is to be created. Leave this field blank for a default SQL setup.
    • QoE Archive Database: create a new or use an existing one (useful for recovery/migration/connect new source scenarios -it will rebuild the ACL’s, connectors and jobs-)
    • Database File Directory: location where the new database files are to be created. Recommended a separate disk volume.
    • Use Multiple Partitions: ‘Multiple partition’ requires Business Intelligence edition or Enterprise edition of SQL Server. ‘Single Partition’ only requires for a Standard edition, but cube processing performance may be impacted.
    • Partition File Directory: (if using ‘Multiple partition’) Path to where the partitions for the QoE Archive database should be placed.
    • SQL Agent Job User – User Name & Password: Domain service account used to connect to the QoEmetrics database and replicate on the QoEArchive

  3. After the databases, instances and account access validation the installation will ask to proceed until completion, hopefully with any error:)
    CQD-setup-Ready_CQD-setup-ArchiveCompleted

Behind the CQD QoE Archive Database

What happened and was configured after the previous installation steps?
This component setup didn’t installed any specific binaries. The installation was in fact a series of configurations on the SQL server used for the CQD Archive database:

  • QoEArchive database was created
  • ‘SQL Agent Job User’ login created and assigned db_owner of the QoEArchive
  • a credential created with the ‘SQL Agent Job User’. This will be used to impersonate the connection to the QoEMetrics on the source SQL server
  • A linked server source, mapping all the databases on the source SQL server
  • A SQL Agent Job and proxy. This is the ‘heart’ that will sincronize the QoEMetrics and the QoEArchive

Known ‘caveats’ regarding the installation and architecture process:

  • Both database and transaction log files are going to be installed on the same folder. You can only change this after using SQL tools and procedures.
  • Not 100% sure about this (need to investigate this one), but I couldn’t find documented support for a QoEMetrics mirrored database.
    If the database fails to the other node the synchronization process fails.
  • Don’t use a domain user account password starting with ‘+’. The setup SQL procedure will ignore it and then you will get the following on the SQL job and the data will not get replicated:
    “Unable to start execution of step 1 (reason: Error authenticating proxy LAB\service.CQD, system error: The user name or password is incorrect.).  The step failed.”
    You can solve this by manually setting the correct password on the ‘QoEArchiveCredential’

How to manage and monitor the CQD QoE Archive process ?

As I told before the QoE Archive is a data synchronization process between the Lync/Skype4B QoEmetrics database and the QoEArchive.
This is done using a SQL agent job that runs, by default, every 15 minutes:
CQD-archive-agentjob

This ‘simple’ job triggers a series of store procedures and will sync the databases tables.
You can see the sync jobs status and errors on a particular table. If you open the tables on the QoEMetrics and QoEArchive, you will confirm that (the second one will have some more tables that are used to control the sync process:

I used the word ‘DB synchronize/replication’ to simplify the idea. In fact, it does what the name means: ‘collects data and add to the existing archive’. “CQD’s QoE Archive database provides a second copy of the QoE Metrics data with much longer retention capabilities”.

If you have multiple Skype4B pools, each with its own Monitoring Server, “CQD does not merge data from multiple QoEMetrics databases!”. “Each CQD instance must point to one QoEMetrics database!”.(*)
“However, because CQD will move much of the reporting workload off of the Monitoring Server, large organizations that deployed one Monitoring Server per Skype4B Pool topology should consider using one Monitoring Server for all topologies”.
But this can compromise using the Monitoring Reports tool to analyse (older) data in a different way and doesn’t handle the other bigger and heavier monitoring database: the LcsCDR. – This is an open topic for a future blog:)

You can monitor the replication process, not just by the Agent Job logs, but there is also a table that contains the history. For example the Agent job will report an error if there is no new data to replicate from and you can only see that here:
CQD-archive-logs.png

Now you have a replica of your QoE data to analyse with the tools described on part 2 (to be published in two days).

Wait ! eastern egg!

For those who manage to read until here without falling asleep, here’s a ‘gooddie’.
Here’s how can automate the previous setup from the command line (you just need to replace the orange text with your settings):

Msiexec /i CallQualityDashboard.msi ADDLOCAL=QoEArchive REBOOT=ReallySuppress CQD_INSTALLDIR=”D:\Skype4B\CQD” QOE_METRICS_SQL_SERVER=”LYNC-BE.my.lab\INST1” ARCHIVE_SQL_SERVER=”LYNC-CQD.my.lab\CUBE” INSTALL_NEW_ARCHIVE=True ARCHIVE_FILE_DIRECTORY=”E:\Databases\CQD” DISABLE_ARCHIVE_MULTIPLE_PARTITION=True ARCHIVE_SQL_AGENT_USER=”mydomain\cqdserviceaccount” ARCHIVE_SQL_AGENT_PASSWORD=”itsAsecret/qb!

The interesting part is that (for SQL standard/single partition deployments) you can run this setup command from another server that is not the CQD SQL database one (as long as you have the SQL client tools installed on the one you run).

 

Call Quality Dashboard: built-in reports

6460fig7The Call Quality Dashboard (CQD) is a new feature released at the same time as Skype for Business 2015 but that also works with Lync 2013. In simple words it gives you a visual overview of your QoE/monitoring data.

It doesn’t have nearly the feature set of paid products like EventZero (that Microsoft bought on Jan 2016) or IR Prognosis but, it’s free and certainly can insight the standard monitoring reports.

It’s a powerful application that allows you to create your own reports and it contains 44 built-in reports. This post is about sharing the hierarchical listing of those reports that you will find right after you finish the installation (I will start tomorrow posting about the CQD architecture, components and installation).

For now it’s just a dump of the headers and description but, as soon as I start getting some nice graphics, I will update this post.

1. Audio Streams Monthly Trend (Managed vs Unmanaged Audio Streams)

This Report shows the monthly audio streams count, poor count, and poor ratio for the last 7 months. There are no filters applied so the data is what is contained in the QoE Database. Audio calls made over wireless and external networks can cause poor call rates to go up. To find the root cause of the poor calls, drill into the data by clicking on the title of the report!

1.1. Managed Audio Streams Monthly Trend

The Managed bucket contains audio streams made by servers and clients on wired corporate network connections. Any poor streams seen here need investigation. Click the report title to drill down!

1.1.1. Server-Server

The Server-to-server Audio Streams Report provides a good baseline for your Managed network environment. The percentage of poor calls using the ClassifiedPoorCall measure is expected to be below 0.5%.

1.1.1.1. Server-Server Monthly Trend

This Report is a copy of the Parent Report and is included here as a reference. The Y-axis scale is fitted to the call volume for Wired-Wired-Inside calls so month-so-month changes are more visible here than in the Parent Report.

1.1.1.2. Server-Server Daily Trend

This Report shows the server-to-server audio streams by day. It has the same filter condition as the Monthly Trend Report.

1.1.1.3. Server-Server by Transport Type

Audio streams between servers should only use UDP. Any TCP streams are not expected and should be investigated. If there is a high percentage of poor TCP streams, it could explain the poor streams in the Server-Server scenario.

1.1.1.4. Server-Server by Server Type Pairs

This Report shows the Poor call distribution among the server user agent type combinations. Each combination represents a specific network path and server endpoint health. The Gateway server type can include SBC providers. Click the title to see a breakdown by GW endpoint names!

1.1.1.4.1. Mediation Server-Gateway Audio Streams

This Report is a copy of the Parent Report except is also includes a filter for just the Mediation Server-Gateway calls. It is included here as a reference.

1.1.1.4.1 Server-Server by Server Location City Pairs

If the servers are generally located in different cities, this Report can show potential network issues in the network path between different locations. The City column requires IT-supplied subnet IP-to-Network-to-City mapping data to be entered into the QoEArchive database.

1.1.2. Server-Wired-Inside

The Server-to-client-wired-inside Report is used to monitor the health of the network paths between the clients and servers.

1.1.2.1. Server-Wired-Inside Monthly Trend

This Report is a copy of the Parent Report and is included here as a reference. The Y-axis scale is fitted to the call volume for Wired-Wired-Inside calls so month-so-month changes are more visible here than in the Parent Report.

1.1.2.2. Server-Wired-Inside by Client Transport Type

Audio streams on the corporate intranet should only use UDP. Any TCP streams are not expected and should be investigated. If there is a high percentage of poor TCP streams, it could explain the poor streams in the Server-Wired-Inside scenario. Click the title of the report to drill down!

1.1.2.2.1. Server-Wired-Inside by Client Transport

This Report is a copy of the Parent Report and is included here as a reference.

1.1.2.2.2. Server-Wired-Inside (TCP) by Client Endpoint

This Report shows all the client endpoints that have reported TCP streams. The rows are sorted by Count of Good streams descending.

1.1.2.3. Server-Wired-Inside by Server Type  

This Report shows the server-to-client-wired-inside calls by Server Type. It can show problems due to server config that are not captured by the Server-Server Reports. Investigate servers that have higher poor call rates than others as well as servers that show sudden increase in poor call rates.

1.1.2.4. Server-Wired-Inside by Client Connectivity ICE 

Audio streams on the corporate intranet should only use UDP. Any TCP streams are not expected and should be investigated. If there is a high percentage of poor TCP streams, it could explain the poor streams in the Server-Wired-Inside scenario. Click the title of the report to drill down!

1.1.2.4.1. Server-Wired-Inside by Client Transport 

This Report is a copy of the Parent Report and is included here as a reference.

1.1.2.4.2. Server-Wired-Inside (TCP) by Client Endpoint 

This Report shows all the client endpoints that have reported TCP streams. The rows are sorted by Count of Good streams descending.

1.1.2.5. Server-Wired-Inside by Client Building  

If Subnet IP-to-Network and Building mappings are populated in the QoEArchive database, this Report will light up with the server-to-client-wired-inside call data broken down by the client endpoint’s Building Name. This is a very powerful way to compare Poor Call Rates for all buildings.

1.1.2.6. Server-Wired-Inside by Client Type  

This Report shows the server-to-client-wired-inside calls by Client User Agent Type. It can show problems due to QoS configuration since that can be applied based on client executable name.

1.1.2.7. Server-Wired-Inside by Client Network Type  

The Network Type is another IT-supplied data set that allows the network subnets to be tagged with IT specific context. For example: “LabNet”, “Wifi”, “Wired”, “DataCenter”, and “Vendor” are all possible classification values. This allows cross checking the IT-supplied values can be compared to client OS observed values for the Network Connection Detail.

1.1.3. Wired-Wired-Inside

The Wired-Inside-Client-to-Wired-Inside-Client Report is used to monitor the health of point-to-point calls that do not involve server endpoints. The network path that these calls take are usually different from server-client calls.

1.1.3.1. Wired-Wired-Inside Monthly Trend 

This Report is a copy of the Parent Report and is included here as a reference. The Y-axis scale is fitted to the call volume for Wired-Wired-Inside calls so month-so-month changes are more visible here than in the Parent Report.

1.1.3.2. Wired-Wired-Inside Daily Trend  

This Report is shows the daily trend of the count and poor call rate measures for the current month.

1.1.3.3. Wired-Wired-Inside (OCPhone-OCPhone) Daily Trend  

This Report shows just the subset of client-wired-inside-to-client-wired-inside calls where both endpoints are IP Phones. This should represent the best possible scenario for wired and inside calls. Poor call rates < 0.1% are not unexpected.

1.2. Unmanaged Audio Streams

The Unmanaged bucket contains audio streams made by clients on wireless networks, public networks, or home networks. Some amount of poor streams are expected. However, a worsening trend of poor call rates warrants investigation. Click the report title to drill down!

1.2.1. Server-Wifi-Inside

The Server-to-client-wifi-inside Report is used to monitor the health of the corporate wifi network.

1.2.1.1. Server-Wifi-Inside Monthly Trend

This Report is a copy of the Parent Report. It is included here as a reference.

1.2.1.2. Server-Wifi-Inside – Best Subnets

This Report shows call quality over enterprise wifi network for each client subnet IP address. If subnet ip address-to-network name mapping is entered in the QoEArchive database, then this report can be changed to group by client building name instead of subnet IP address.

1.2.1.3. Server-Wifi-Inside – Worst Subnets 

This Report is similar to the previous Report except it is sorted from worst Poor Call Percentage to best.

1.2.1.4. Server-Wifi-Inside by Client Wifi Chipset  

wifi chipset

1.2.2. Server-Wired-Outside  

The Server-to-client-wired-outside Report is used to monitor the health of the network path from the servers to the internet edge. Changes in Poor Call Rates month-to-month should be investigated.

1.2.3. Server-Wifi-Outside 

This Report is used as comparison to the Server-Wired-Outside Report.

1.2.4. Wired-Wired-Outside-DIRECT 

This Report shows poor call quality when 2 client endpoints are connected directly. It is used in conjunction with the Wired-Wired-Outside-RELAY report to identify any potential Media Relay Edge or datacenter edge issues.

1.2.5. Wired-Wired-Outside-RELAY 

This Report shows poor call quality when 2 client endpoints are connected through one or more Media Relay Edge servers. An increase in poor call percentage should be investigated.

1.2.5.1. Wired-Wired-Outside-RELAY 

This Report is a copy of the Parent Report. It is included here for reference.

1.2.5.2. Wired-Wired-Outside-Relay By Relay IP Address 

This Report shows the client-outside-wired-to-client-outside-wired calls that used one or more Media Relay Edge Servers. The data is broken down by one client’s Relay Server IP Address. There could be more than one Relay in the call but pivoting on just one can give a sampling of the relative call quality across the Relay servers. This Report also demonstrates the use of browser-side filtering of the results to remove any rows that do not contain more than one good stream.

1.2.6. Wired-Wired-Outside-Other 

This Report shows poor call quality when 2 client endpoints are connected not directly or by a relay. It is used in conjunction with the Wired-Wired-Outside-RELAY, Wired-Wired-Outside-DIRECT reports to identify any potential Media Relay Edge or datacenter edge issues.

1.2.7. Other Unmanaged Calls 

This Report captures the Unmanaged audio streams that do not belong to any of the other Unmanaged Scenarios. For example, Wifi-Wifi calls would be represented in the Report.

1.3. Other (Invalid Report)

The Other bucket contains audio streams that cannot be classified as Managed or Unmanaged. Classification of streams into Managed or Unmanaged requires the network connection type and access location and the data must be reliable. Endpoints that do not send QoE reports will be classified into the Other bucket. The StreamType.StreamType dimension has a value of ‘false’ if the stream cannot be classified.

1.3.1. Other (Invalid Report)  

This Report is a copy of the Parent Report.

1.3.2. Other (Invalid Report) by User Agent Types 

This Report contains Server-to-client calls grouped by the client User Agent Type.

2. User-reported Call Quality Rating Histogram

This Report shows the count of each of the possible User-collected rating. The possible values are 1 – 5 with 5 being the best and 1 being the worst. The rating values are only collected via Skype for Business Clients.

2.1. User-reported Call Quality Rating Monthly Trend 

This Report shows a monthly trend of the count of each of the possible User-collected rating. The possible values are 1 – 5 with 5 being the best and 1 being the worst. The rating values are only collected via Skype for Business Clients.

 

Taking control of the rtcReplicaRoot folder

xds-replica-wrongWhen you use the setup (or migration) assistant, you know that you cannot control several installation locations, like the databases and specially the xds-replica folder.

I learned myself, since Lync 2010, how to control  the install location by performing a manual setup of part of the components (see this post at step 10).
But if you are performing a Skype4B inplace upgrade, the assistant will use remove the previous version of the replica service and install the new using the default logic.
If you have (like me) multiple volumes on your Windows server you might have this folder where you don’t want it (like in a dedicated pagefile or SQL data volumes).

If you didn’t find the logic of this install location, here’s the only MS documentation reference note about it:
During the upgrade process the xds-replica is placed in the local shared folder on the disk drive with the most free space. If that disk is later removed then you can run into issues such as services not starting.

Let’s skip the discussion of why you need the emptiest volume for a small size directory structure and concentrate on the main issue:

How can I move the rtcReplicaRoot folder?

You can google-foo and find some references (here and here) how can you manually tweak the folder, shares, acl’s and do some registry changes.
But this has some inconvenients: the uninstallation of the component will probably fail/generate errors. This will complicate an upgrade/patching process and requires you again, to manually fixed it.

Using the ocscore.msi setup package is also a big challenge:
• the REPLICA agent service is inside the ‘Lync/Skype4B core components’. if use the ‘programs/features’ to uninstall them (if it allows) it will break all the other components;
• if you manage to find out the specific uninstall switch, it will -by default- drop the local XDS database (and loose the local topology reference and the local certificates in use;
• also a new installation can overwrite the existing XDS database with an empty one.

By using the undocumented setup switches, you can effectively remove and control the setup of the rctReplicaRoot on a specific folder. This procedure has 3 great advantages:
•  it’s a standard MSI supported installation – no disruption for patching or upgrades;
•  doesn’t require to apply the latest patches, since it uses the local server MSI cache;
•  It can be done without stopping the main Lync/Skype4B services:)

Skype for Business Server 2015

The process was greatly simplified by including two tiny switches to allow future upgrades (unlike previous versions of Lync):

  1. stop the related services (via powershell)
    Stop-CsWindowsService REPLICA
    Stop-CsWindowsService RTCCLSAGT
  2. Uninstall the related component services
    MsiExec.exe /i {DE39F60A-D57F-48F5-A2BD-8BA3FE794E1F} KEEPDB=1 REMOVE=Feature_LocalMgmtStore REBOOT=ReallySuppress /qb!
    This will remove all the related service components, rtcreplicaroot folder, share and ACL’s
  3. Install the component services
    Msiexec /i {DE39F60A-D57F-48F5-A2BD-8BA3FE794E1F} ADDLOCAL=Feature_LocalMgmtStore SKIP_DB=1 REPLICA_ROOT_DIR=”[fullpathto_rtcreplica_folder]” REBOOT=ReallySuppress /qb!This will install all the related service components, the rtcreplicaroot folder on the desired location, create the share and set ACL’s and registry entries.
  4. Enable the local replica service (via powershell)
    Enable-CsReplica
  5. start the related services (via powershell)
    Start-CsWindowsService REPLICA
    Start-CsWindowsService RTCCLSAGT

Lync Server 2013

The setup package was not designed for this particular task:
• The install will overwrite any existing XDS with a new/empty one
• The uninstall will drop/delete existing XDS

In fact that Skype for Business inplace upgrade assistant was designed to handle especially this situation, by using the existing utility (InstallCsDatabase) that manage the local databases:

  1. stop the related services (via powershell)
    Stop-CsWindowsService REPLICA
    Stop-CsWindowsService RTCCLSAGT
  2. Detach the XDS database (to avoid the uninstall from deleting it)
    “%CommonProgramFiles%\Microsoft Lync Server 2013\DbSetup\InstallCsDatabase.exe” /Detach /Feature:CentralMgmtStore
  3. Copy the database files (xds.mdf and xds.ldf) to a safe location
  4. Uninstall the related component services (elevated command prompt rights)
    MsiExec.exe /i {8901ADFC-435C-4E37-9045-9E2E7A613285}  REMOVE=Feature_LocalMgmtStore REBOOT=ReallySuppress /qb!
    This will remove all the related service components, rtcreplicaroot folder, share and ACL’s
  5. Install the component services  (elevated command prompt rights)
    Msiexec /i {8901ADFC-435C-4E37-9045-9E2E7A613285} ADDLOCAL=Feature_LocalMgmtStore REPLICA_ROOT_DIR=”[fullpathto_rtcreplica_folder]” REBOOT=ReallySuppress /qb!This will install all the related service components, the rtcreplicaroot folder on the desired location, create the share and set ACL’s and registry entries.
  6. Drop the empty XDS database (created on step 5)
    “%CommonProgramFiles%\Microsoft Lync Server 2013\DbSetup\InstallCsDatabase.exe” /Drop /Feature:CentralMgmtStore
  7. Copy back the database files (xds.mdf and xds.ldf) from step 3
  8. Attach the XDS database
    “%CommonProgramFiles%\Microsoft Lync Server 2013\DbSetup\InstallCsDatabase.exe” /Attach /Feature:CentralMgmtStore
  9. Enable the local replica service (via powershell)
    Enable-CsReplica
  10. Start the related services (via powershell)
    Start-CsWindowsService REPLICA
    Start-CsWindowsService RTCCLSAGT

Notes about these commands and procedures

  • The uninstall will prompt you with a warning regarding active core components services. You can safely confirm this action has the main core components are kept.
  • You need to run the Msiexec and InstallCsDatabase with an elevated command prompt
  • InstallCsDatabase is case sensitive on some the parameters (/Feature:)
  • Feature_LocalMgmtStore – is the feature name identifier inside the ocscore.msi package
  • KEEPDB=1 will prevent the uninstall to drop the XDS database
  • SKIP_DB=1 will prevent the setup to overwrite and use any existing XDS database
  • REPLICA_ROOT_DIR will tell the setup it will create the  ‘xds-replica’ folder inside the define path (I usually use a subdirectory inside the installation of Skype4B)
  • You can use the PS commands to check if the local replica service is working properly (UptoDate=true)
    Invoke-CsManagementStoreReplication -ReplicaFqdn <your FE server FQDN>
    Get-CsManagementStoreReplicationStatus -ReplicaFqdn <your FE server FQDN>

Congratulations !

You now have control of your xds-replica rtcReplicaroot folder:)

Read more of this post

S4B Front-end servers event 4097 flooding

After several installations and Skype for Business 2015 (S4B) Server upgrades, a colleague of mine pushed my attention to a large ammount of event id 4097 warnings on the Administrative Events view related to Windows Fabric. The rate if this flood could be 3-5 events every 5-15 minutes:
– “cert chain trust status is in error: 0x1000040”
– “ignore error 0x80092013:certificate revocation list offline”
You can check all those events on the specific Windows Fabric log shown on the below picture.

Question: Why and where do this event come from?

S4B installs Windows Fabric 3.0 and added, between others, additonal securitys setting in the fabric intracluster communications. You can find this settings on several configuration files:
%PROGRAMDATA%\Windows Fabric\FabricHostSettings.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\ClusterManifest.current.xml
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Data\InfrastructureManifest.xml”
%PROGRAMDATA%\Windows Fabric\<FEserverFQDN>\Fabric\Fabric.Config.<version>\Settings.xml”

The relevant settings related to the events can be found on the ‘Security’ section:

  <Section Name=”Security”>
….
    <Parameter Name=”SessionExpiration” Value=”28800″ />
    <Parameter Name=”IgnoreCrlOfflineError” Value=”true” />
    <Parameter Name=”CrlCheckingFlag” Value=”3221225476″ />
   ….
  </Section>

The IgnoreCrlOfflineError is self-explanatory: If Windows Fabric encounters this error, ignore it and continue operations. For the CrlCheckingFlag values we need to look a little ‘deeper’. The Windows Fabric configuration files are generated by the Front-End server (RTCSRV / RtcHost module) at service startup based on this template file:
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template

On this file you will find a little more about the flags value:
      <!–
        CrlCheckingFlag setting follows the rest of the Lync Server components (sipstack, web) which  set the following flags:
               CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL=0x00000004 |  // do not go on the wire for cert retrieval
               CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY=0x80000000 |  // do not go on the wire for cert revocation check
               CERT_CHAIN_REVOCATION_CHECK_CHAIN_EXCLUDE_ROOT=0x40000000
                                                              0xC0000004=3221225476 (unsigned int)
      –>
      <Parameter Name=”CrlCheckingFlag” Value=”%CRLCHECKINGFLAG%” />

More information regardind the flags can be found on MSDN CertGetCertificateChain  function.
Basically, S4B services (including the Fabric) are configured to check certificates revocation (CRL) using local cache only, as the comments on the template says: do not go on the network to retrieve and check the CRL.

So… how does it check for a certificate revocation?
Running a trace on the Fabric process, we can see that it never tries to connect to a CRL distribution server but, a little before the events are logged, it reads the registry keys related to certificates.

Since there are no cached or local CRL available, it will report an CRL offline error, ignore it, but still write it on the event log.

Do the other Skype for Business services report this error? No, it’s hidden. They cause the same issue but, to see it, you need to enable CAPI2 logging:

Question: What can we do about this ?

I can see at least 3 options:

Option A (The ‘no worries’ one) – Ignore it!
Difficult level: noob
Advantages: Nothing to be done. It’s a normal S4B operation and nothing is affected
Disadvantages: Your administrative events view will be full of these events which makes difficult to find important warning

Option B (The ‘clean’ one) – provide the CRL to the local server
Difficult level: professional services
Advantages: You will not just stop seeing the Fabric events, but it will also optimize all the other S4B services
Disadvantages: Requires knowledge and script programming, since you want to schedule a task to get the CRL and put it on the machine certificate local store.

This option also proves how the CRL caching / ignoreofflineCRL works.
Instructions:
1. Find the certificate name used by Windows Fabric. You can get this from the Fabric configuration files (spoiler alert: it’s the same used by Lync internal services)
2. Find on the certificate the Distribution paths where to get the CRL

3. Download the CRL and install it on the Trust Root CA or Intermediate CA store depending if the certificate was issue by a Root or a Subordinate CA

As soon as you upload the CRL to the local computer certificate store, you will see that the Fabric process will read the CRL from the store.

… and no more 4097 events !

As you noticed, the CRL are updated regularly (daily or weekly) so they have an expiration and need to be retrieved. This means that doing this manually is a error-prune, time-comsuming task, which you can solve with a scheduled script (not included on this blog… yet!)

Option C (The ‘tweak’ one) – manipulate the CrlCheckingFlag
Difficult level: moderate: just need to know what you are doing
Advantages: It’s easiest way to stop seeing the Fabric events
Disadvantages: It’s a S4B undocumented parameter customization (altough I don’t believe MS will not refuse support if they found out). It can also be overwritten by an update.

This ‘quick fix’ option is about manipulating the file
<S4B installation folder>\Server\Core\ClusterManifest.Xml.Template’

Intructions:
1. Change the parameter of the ClusterManifests.Xml.Template
<Parameter Name=”CrlCheckingFlag” Value=”0″ />
2. Stop the RTCSRV and FabricHostSvc services
3. Start the RTCSRV service – it will create new Fabric configuration files based on the template file and also start the FabricHostSvc

Actually, according to this blog, this parameter value is required if you issue certificates withouth any CRL Distribution Point information…. or your S4B pool will not start at all (!)
Newly installed Skype for Business Front-End Pool refuses to start

 Final notes:

  • If you want to know more about Windows Fabric and S4B we have a great and detailed explanation on this blog

iOS S4B clients don’t show some Distribution (expansion) Groups

This is somehow a ‘sequel’ of the previous post: ‘Windows S4B clients don’t show some Distribution Groups‘ but for iOS based clients.

ISSUE #1 (and #2)

We have again the same Distribution Groups/Lists (DL’s) configured on our Destkop client (picture on the left), but when you sign-in with the iOS S4B (picture on the right) client some DL’s might not appear:
S4B-dekstop-allS4B-ios-missing1

CAUSE #1

Would could just conclude that it would be the same reason as the previous WP issue, but as you can see the SMTP list (doesn’t have a displayname on AD)  is being displayed while the one is missing is the DL’s with the Displayname doesn’t appear.
In this case you might not even consider an issue. The reason is simple as: if the DL’s don’t contain any members, the S4B client will not show it.

S4B-dekstop-all-expanded

There is really no explicit entry on the client log file. The only evidence is that the GET https://<webservicesexternalurl>/…/expand will return the members of the DL (resource rel=”contact”):

HTTP/1.1 200 OK
Cache-Control: no-cache
Via: 1.1 dc1.mylab.local RtcExt
Content-Length: 1557
Content-Type: application/vnd.microsoft.com.ucwa+xml; charset=utf-8

<?xml version=”1.0″ encoding=”utf-8″?>
  <resource rel=”distributionGroup” href=”/ucwa/v1/applications/212440241480/people/groups/Test.DL2@mylab.local/expand” xmlns=”http://schemas.microsoft.com/rtc/2012/03/ucwa”&gt;
  <property name=”uri”>Test.DL2@mylab.local</property>
  <property name=”id”>Test.DL2@mylab.local</property>
  <property name=”name”></property>
  <resource rel=”contact” href=”/ucwa/v1/applications/212440241480/people/test.user1@mylab.local”>
      <link rel=”contactPhoto” href=”/ucwa/v1/applications/212440241480/photos/test.user1@mylab.local” />
      <link rel=”contactPresence” href=”/ucwa/v1/applications/212440241480/people/test.user1@mylab.local/presence” />

      <property name=”mobilePhoneNumber”>+11111111</property>
      <property name=”type”>User</property>
      <property name=”name”>Test User1</property>
      <property name=”etag”>465437542</property>
   </resource>
</resource>

S4B-desktop-fixedSolution #1 (and #2)

If you want to see the DLs on the iOS client, you just need to:

  1. Populate the AD distribution group with one or more member;
  2. Wait for AD replication
  3. Exit and reopen the Desktop client.
    This will force the membership to refresh the DL (picture on the right)
  4. And the iOS client will start showing the group, right ?…
    NOPE! :even if you restart the entire device,
    remove and re-add  the DL
    …. the S4B client will not show it!

This will take us to a (new post?) issue #2: client user caching.
For now here’s two quick ‘workarounds’ preview solve it:

  • Option A: Sign-in with another user and sign-in back with your user
  • Option B: remove and reinstall the client on the iOS device

Finally, I have the ‘TEST DL1’ on my iOS client: S4B-desktop-fixed

FINAL NOTES

This behaviour was observed:

  • With the current iOS/S4b version at the time:
    Skype for Business v6.5.0.177
    running on iOS 9.3.2 (13F69)
  • It happens with both Lync 2013 and Skype for Business 2015 server infrastructure
    (so it’s not an ‘upgrade caveat’)

Windows S4B clients don’t show some Distribution (expansion) Groups

ISSUE

After migrating your Lync 2013 infrastructure to Skype for Business 2015 some of your users might complain that some Distribution Groups (Distribution Lists)(DLs) might not appear on their Windows Phone (WP) mobile clients.
The DL’s are Active Directory (AD) group objects designed for email applications:
AD-DistributionGroups

You can only add them to your Lync personal contact list using the Desktop Client:
desktopclient-contacts

After you add them, the contact list is automatically updated on any user logged on devices. But on a Skype for Business infrastructure some groups on your personal list might no appear on WP client (the same client connected to the Lync Server on the left and to S4B server on the rigth):
WPclient-Lync-showingGroupWPclient-S4B-missingGroup

CAUSE

The root cause is that, when you use the AD snap-in to create the groups in a standard way the field ‘Displayname’ is empty. The Desktop client:
– If there is a DisplayName on AD, the client will save and display it on the contact list
– If not, it will use the SMTP address (and save)
This behaviour happens when adding a Group and it’s updated everytime you sign-in with the client. You can see that on the client desktop picture above with the groups TEST.DL1 and TEST.DL2.

How does this impacts with Skype for Business Server and the WP client?

(A) Looks like there’s a change between Lync and Skype for Business external webservice behaviour and the way the WP handles the Group names.
Here’s the traffic capture comparison:
>> The Lync server send all the group information data
DL-WP-Lync-request

>> But the Skype for Business Server will is sending one particular field empty
DL-WP-S4B-request

(B) With the property ‘name’ empty the WP will not include the Group on the contact list to the display list, as captured on the WP client logs:
WPclient-DG-trace.png

SOLUTION

As described before, the AD snap-in doesn’t show you the DisplayName to fill in immediatly (like you do for a user account). You need to ‘show advanced options’ on the main console menu to view the ‘attribute editor’ tab and put the DisplayName value:
DGroup-displayname-fixup

After that, sign-in with the Desktop client. It will update the Group name and a view seconds later it will appear on the Windows Phone client:
desktopclient-contacts2WP-client-S4B-fixedGroup

FINAL NOTES

  • This is more likely to happen when you don’t have an Exchange/Email Server, or at least not, on the same AD as Skype for Business (since distribution lists are created on Exchange console and you define the DisplayName).
    Most probably the Office 365 AD connector takes care of this for those who have S4B on premise scenarios and use Exchange online
  • This was found on a Lync/S4B upgrade scenario, but the issue is: if you have S4B infrastructure and you create DLs manually and didn’t wrote the Group ‘DisplayName’
  • This behaviour was observed with the latest WP client version at the time
    version 6.3.1558.0
  • The iOS (iPhone, iPad and iPod) client doesn’t behave like this. If the property name is missing, it will display the group with the SMTP address
    But for these ones:
     they also have a ‘not displaying DLs’ issue condition – discussed on my next post😉 and the ‘SMTP like’
    – if you add/refresh some other DL’s you might see that the SMTP like displayname will be empty !
    S4B-iOS-noDLname

SkypeforBusinessUpdater CU1 (and later) installation can perform an unconditional OS restart

ISSUE

During a testing phase of a Lync 2013/Skype for Business 2015 upgrade, I was caught be surprise when the servers restarted at the (final) cumulative update (CU2) phase. Since all the logs are, by default, written on the %TEMP% folder, after the server reboot they are…. gone!😦

I assume that if someone is just applying the CU1 on a Skype for Business 2015 will have the same experience.

22/May/2016: updated content with the cause and final solution

CAUSE

The cause is that when you execute the Stop-CsWindowsService, the Web Server is stopped but there is a  IIS Worker Process that stays in memory (leak issue):
iisrewrite-hang
This information can be found on the setup log of the iis rewrite:
“IIS URL Rewrite Module 2. The file C:\Windows\system32\inetsrv\rewrite.dll is being used by the following process: Name: w3wp “

The CU1 (Nov 2015) introduced a new install file (rewrite_2.0_rtw_x64.msi) that it’s a updated version of the ‘IIS URL Rewrite Module 2’. It appears on the SkypeforBusinessUpdater GUI, but it’s not listed on the official list of Updates for Skype for Business Server 2015.
s4b-cu2-list
Could not find the reason and why include now this version released on the 1st of May 2015 ? (S4B initial binaries are one month older…)

Fortunally one particular server didn’t restarted, which allowed me to ‘peak’ the installtion logs. It was a quick catch:
* While all updates are executed with parameters REBOOTPROMPT=S REBOOT=ReallySuppress
* The ‘IIS URL Rewrite Module 2’ is executed with REBOOTPROMPT=S IACCEPTEULA=yes

This ‘tiny’ difference make all the difference:
* REBOOTPROMPT=S means that no reboot prompt will be shown to the end user
* REBOOT=ReallySuppress means that even if required a reboot will not be performed.

So the result is that, if the ‘IIS URL Rewrite Module 2’ informs that a reboot is needed to complete the installation (ex: because a file is in use and will be scheduled for the next restart), the Windows installer will automatically reboot the server !!

Consequences

Fortunally this package is the last of the updates to run, so no other update is skipped, BUT…
…  the server is going to restart and, with him, all the Skype for Business services. And you will not like if:
(1) some of the updates were not sucessfully installed, and you find yourself with differente CU components versions causing unexpected service errors on the event viewer.
(2) you are in the middle of a pool upgrade and:  you find yourself with a (rebooted) server with the RCT service on ‘pending start’ -because the pool might not have enough quorum-
(3 and the bad one) You are in the middle of a Lync 2013 upgrade and you got now a server already started in S4B. If you are still trying to upgrade other Lync 2013 servers, you will get a warning that a FE pool member server is running, and you need to kill the services.

The other unwanted result is that you will probably lose the log files (kept on the user TEMP directory and deleted at restart) that might help to troubleshoot some issue.

Solution

Before you install the CU1 update:

  • Stop the S4B services as instructed (powershell command: Stop-CsWindowsServices)
  • Execute the command IISRESET /STOP
    This will force all IIS dependent components to stop

This way we can run the updates withouth an unexpected server restart.

The case of the “DatabaseInaccessibleOrMirroringNotEnabled”

Although is not a direct Lync issue, in a Enterprise deployment pool we use SQL mirroring for database high availability. I’ve found a strange issue that affected 1/4 of the SQL servers. I wasn’t unable to identify precisely the root cause, but here’s some clues and a workaround.

ISSUE

The SQL mirror node might not start correctly after the server reboot (it can also only fail after a second restart).
This as been observed on Windows 2012 Servers.

SYMPTOMS

When you try to get the database mirror state from Lync PS it will show you that the mirror is not enabled/disconnected/DatabaseInaccessibleOrMirroringNotEnabled:
Lync-DBmirror-failure

Lync-DBmirror-failure4The SQL management studio show the databases with the status ‘In Recovery’.
If you run the SQL query to check the mirror endpoint status it will show you that is started

SELECT type_desc,state_desc FROM sys.database_mirroring_endpoints

Further log investigation will show that the SQL server at initial startup reports that the mirror endpoint configuration is disable for each database:
Database Mirroring Transport is disabled in the endpoint configuration

Lync-DBmirror-failure3

A few lines later on the log you will see the mirror endpoint being enabled, but the databases remain at ‘in recovery’ state.

 (Potential) CAUSE

Lync-DBmirror-updatesI still cannot pinpoint the exact cause, but it looked at first it was caused by two updates installed at the same time:
KB3147071 (this is not a security update) – “Connection to Oracle database fails when you use Microsoft ODBC or OLE DB Driver for Oracle or Microsoft DTC in Windows”
* KB3146706 – (this is an important security updates) – “Security update for Windows OLE”
NOTE: the other updates on the pictures were also removed at the same time, but from the KB information I don’t see a relation with the main issue.

WORKAROUND(S)

(a) The only effective ‘quick fix’ is to force, on the affected SQL server, the mirrror endpoint status to ‘started’ with the SQL query:
ALTER ENDPOINT mirroring_endpoint STATE = STARTED
Attention!: ‘mirroring_endpoint’ is the default name created by Lync/Skype4B setup commands. You might need to confirm the enpoint name using the query:
SELECT name,type_desc,state_desc FROM sys.database_mirroring_endpoints

(b) Restarting the SQL services will also solve the issue

(c) rebooting the server can sometimes solve the issue (only do this if you don’t have rights to manage/restart the SQL server)

The above workarounds will not solve the issue the next time you need to reboot the server.

(not the) SOLUTION

(1) Uninstall KB3147071 and KB3146706 and restart the server.

(2) Install KB3146706 again and restart the server.

(3) (optional) install KB3147071

(4) reboot the server (at least) twice to check if the mirror starts sucessfully

Update 17/05: The sugested solution was not effective. After 24 hours I restarted the server and the issue is back😦

 

Phone number formats that display the click to call icon on IE browser

You might have been asked from you company web developers what number format should they use to display the click to call icon on your intranet corporate directory, or for you customers to be able to dial from a click on your company phones number main contacts. The difficult part is to ‘google-fu’ where that is documented and/or memorize it.

Here’s how to find them.

First of all the, icon is a IE plugin that is installed with the Lync/Skype for Business and reads reads the web page content while is loading. Make sure that the plugins are running before reading this post.
SfB-helper

By ‘analysing’ the plugin OCHelper.dll (this is the January 2016 update version, previous or future versions might differ), you will find 4 REGEX strings
OCHelper.dll

by using Jex’s Regulex, let’s analyze the first expression:
OCHelper.dll-regex1

The first column of numbers should display the click to call while the second will not.
+1-333-333-4444(you should see the call icon)
+1 333 333 4444
1-333-333-4444
1 (333) 333 4444
333.333.4444
(333)-333-4444
+13333334444 (nope: missing separator of 4 groups of digits)
1.22.333.4444 (no 3 numbers on second group)
1.333.22.4444 (no 3 numbers on 3rd group)
1.333.333.333 (no 4 numbers on last group)

 The second expression look long, but it’s easy to understand:
OCHelper.dll-regex2

This numbers should show the icon: +1-(800)-click-me, 888.1234open
But more than 9 characters after the tool-free will fail: 1.900.too-long-string

The third expression string looks dedicated to non-US (international prefix + format) numbers
OCHelper.dll-regex3

This numbers should show the icon: +41 58 30 1111, +41 503 011 11
But other formats might not work  (when you try to omit your country code for national customers): (+41) 58 301 111, 58 301 111

The 4th and last expression is an attempt to cover other cases of digit grouping
OCHelper.dll-regex4

But if you want to really be sure all the times, just embed on the webpage code the universal format of tel:+222222222

Now you have a ‘cheat list’ on how to embedded your telephone number on web pages:)

  

What you should know about Lync (Skype for Business) updates after Dez/2015

fallback-lynconlineIf you have deployed on your organization a firewall/proxy with Deep Packet Inspection (DPI), some of your Lync (SfB) users might have questioned you about a certificate popup similar to picture on the right. By now they probably are not having it anymore.
You will also get much more similar popup if you like to control/restrict the desktop trusted certificates on your environment (check the last paragraph on the blog)

Here’s the ‘deconstruction’:

If you were an earlier adapter of the KB3114351 – security update for Lync 2013 (Skype for Business): December 8, 2015 (this also applies to security update for Skype for Business 2016: KB3114372), you didn’t find by that time on one ‘functionality’ that was introduced.

KB3114351 was later updated on 23 Dec 2015 – “This security update contains the following improvements: •Adds Cloud-based Discovery •Uses SSO to autodetect SIP address and start sign in” – But no more details.

On the 12 of January, KB3114687 was published:
“”The address type is not valid” error when you sign in to Lync 2013 (Skype for Business) by using domain\username format”.
“This issue occurs because if Lync 2013 (Skype for Business) client fails to detect lyncdiscover and lyncdiscoverinternal services. In this case, the new DNS-less discovery (also known as Cloud-based discovery) redirects you to the Microsoft Lync Online Discovery services in Office 365″

The KB points two other two KB (also initially released on the same date):

What impact does it had to your Lync on-premises environment?

The December 8, 2015 security update added a new ‘fallback/DNS-less autodiscover’. If your client doesn’t resolve any of the lyncdiscover* DNS addresses it will then query odc.officeapps.live.com and only after an answer (or failure to connect) it will continue the other SRV/DNS fallback discovery.
fallback-lynconline-nettrace

Besides the issue KB3114687 issue, if your clients don’t receive the lyncdiscover* records (either because you don’t use it or there was some issue connecting to the DNS, they are going to the internet and there’s where the users get warning picture from at the top of the post.
If you don’t get (or the user press the ‘connect’) you might get your user trying to authenticate and query Office365 Cloud. It will fail, and then continue throw the SRV autodiscover and login back to the on-premises Lync.

The January 12, 2016, update for Lync 2013 (Skype for Business) changed (fixed) the previous autodiscovery priority  (1)lyncdiscover* (2)SRV records (3)sip* (4) DNS-less odc.officeapps.live.com
fallback-lynconline-nettrace2

The probability for a on-premise user to ‘fallback’ to the cloud is now very little. But in any case, if you have a on-premise-only deployment of Lync, my advice is to disable Cloud based discovery on your clients either by a GPO (if it exists) or setting the following registry key:
reg add HKLM\Software\Policies\Microsoft\Office\15.0\Lync /v DisableCloudBasedDiscovery /t REG_DWORD /d 1 /f

After setting this key, the Lync (Skype for Business) client will no longer query (and connect to) the odc.officeapps.live.com.

You will still catch some Lync network traffic going to MS servers, but that’s another story:)

The other ‘Cloud based functionalities also are causing other issue refered on my previous post “Users are being asked for credentials after applying KB3114351 or KB3114502”

____________________________________
Extra info:

If you still have the only the December 8, 2015 security update installed, here’s a ‘quick trick’ to get the certificate popups on your desktop client:
(1) delete the trusted CA installed by Lync client in both user and local machine (might need to restart your PC)
fallbacksim-cert
(it seems that Lync will install the certificate when you start it)
(2) point your lyncdiscover* records to a dummy IP. ex: create on the hosts file something like: 127.0.0.1 lyncdiscoverinternal.<yoursipdomain> lyncdiscover.<yoursipdomain>
(3) delete the EndpointConfiguration.cache file from %localappdata%\Microsoft\Office\15.0\Lync\sip<yoursipuseraccount> alternatively you can just press the ‘delete my sign-in info on the main Lync login screen’

 When you login with your user again you should be prompt for a ‘certificate trust issue’

Users are being asked for credentials after applying KB3114351 or KB3114502

Lync-Jan2016-login

It’s ‘reverse engineering’ time, install/remove updates and work for MSFT for free… again.

ISSUE

So… after you applied the KB3114351 -security update for Lync 2013 (Skype for Business): December 8, 2015- or the KB3114502 – January 12, 2016, update for Lync 2013 (Skype for Business) some (if not all) of your users will start complaining that the SfB Client is asking for credentials at startup.

The main clue that something is wrong is when you see the username filled with SIP address (picture on the right) and you know that is not the correct user’s AD login and the user was never asked before for credentials because the client should use the Windows domain logon credentials.

CAUSE

KB3114351 and KB3114502 changed the Client Sign-in algorithm and some particular deployment scenarios were not tested (check ‘affected scenarios’ below).

By using an issue-free client connected on the internal LAN (this is important later), when the client starts six  new (there are more) registry key appear:
Lync-Jan2016-rotten
The bottom one WindowsAccountSipUri came quickly to my attention. “Goggle-fu” didn’t help too much (if MS is reading this: why KB only refers to issues fixed/new but rarely mentions changes on behavior? why a “security update” includes new features?), and after an overnight of tests here’s how the new updates changed the client logon process:
1. At startup if the registry key ServerSipUri has an address, the client will query the active directory for the user’s SIP address and write it on the WindowsAccountSipUri.
2. If both keys match, the client will use the Windows domain user credentials to sign-in (and no one complains about it anymore).
3. If both keys don’t match, it will popup the password prompt for the user. If the user sip address doesn’t match the Active Directory, doesn’t matter if the user put the wrong or right password. The client will fill the username with the SIP address (and also the ServerUsername key). At this point you are screwed up and you need to help.
4. If the user is a ‘techie’ he might know how to manually set the right username and password and will not call you, BUT if he didn’t check the ‘save password’ it will be prompted for credentials the next logon time, and again… and then it will call you:)

Before those updates the SfB Client would attempt to logon the sip username with the Windows credentials. Only if they failed it would popup for different credentials.

AFFECTED SCENARIOS

Based on the point 3: When both keys don’t match?
> When the client cannot get any SIP address of the user from AD. Then it doesn’t have a value for WindowsAccountSipUri (and to match with the ServerSipUri)

(Here’s where you can confirm if you are affected)
When the Client cannot get the SIP address from AD?:(A) When the client starts (after the patching) accessing remotely from the Internet
(B) When you have a Lync resource forest topology deployment and your user forest doesn’t contain any msRTCSIP-PrimaryUserAddress AD  field (or not have a value)
(C) isn’t the previous one also similar to Office365 with ADFS deployment?

SOLUTION (or workarounds)

(all) clear the registry key ServerUsername if doesn’t match the user’s UPN on AD. Or replace with the NTLM (domain\username), it might solve another additional issue reported that the SfB client prompting for Exchange credentials

(A) if the user can connect to office (go onsite or use a VPN),  and restart the client. As soon as the WindowsAccountSipUri has the SIP address, you will not see the issue (i hope!)

(B) manually configure the username, password and check ‘save password’ box (probably fixes temporarily) and wait for the official MSFT February update

(C) manually add the WindowsAccountSipUri with the sip address of the user if you are sure that it’s your scenario.

FINAL NOTES

Although extensive I didn’t perform all imaginable scenarios. There might (and I saw) be some other conditions or the engine client that will not expose the issue. Here’s some discoveries:

The WindowsAccountSipUri value is not validated once it is created. On a simple Lync server deployment:
1. Login with a user.
2. Change the users SIP address on Lync server
3. If you try to sign-in on the same SfB client, you will not be able to login with the new sipaddress (even with the correct username/password) or will experience the issue. The workaround is to delete/fix that registry key (or delete the windows local user profile).

Another way to have the issue using scenario (B) above:
1. login with a user for on a workstation (the idea is have new userprofile)
2. start SfB, fill the sip username and single sign-on will  work (no username/credentials asked)
3. all next sign-ins SfB will prompt for credentials (except if you click on ‘save password’ option)

Nov2015 client update issue: no Missed call notifications (solved)

KB3101496-issueOne customer end-user reported this issue and after some testing here’s the case.
The Missed Call Notification is… missing in Outlook folder and Lync.

After applying the  KB3101496: “security update for Lync 2013 (Skype for Business): November 10, 2015”, the Lync client will not save the missed calls in Outlook, but:
– All the other calls are registered.
– This will not happen if you have Exchange UM integration configured.

KB3101496-issueB

Cause: component update issue

Details: the client will use Exchange Web Servers (EWS) to create a missed call notification message (Itemclass IPM.Note.Microsoft.Missed.Voice, signed on the footer ‘Skype for Business’). Before the KB3101496, you see him connecting to Exchange server
KB3101496-issueC
After applying the update, there is no connection attempt.

Solution: To fix this issue, install the February 9, 2016 update (KB3114732) for Lync 2013 (Skype for Business).

(previous) solution: uninstall the KB3101496 update, and don’t apply any newer updates since they also have the issue.

There are more people confirming this issue, and other reporting a Lync not showing any call records on the client.

A much more concerning update issue than my previous one reported previously.

Updates:
(11.12.2015): December SU 2015 (KB3114351), didn’t solve the issue.
(07.01.2016): Trond Egil also confirmed the workaround
(05.02.2016): January 12, 2016, update for Lync 2013 (Skype for Business) , also didn’t fix the issue
(09.02.2016 ): Issue was documented on KB3136400
(10.02.2016): Issue was solved with KB3114732

 

 

(bypassin) Lync/SfB topology limitations

While I was preparing for a customer a PoC of a full redundant solution, I came again across a well-know limitation of attempting to consolidate multiple roles usingn the less amount of servers (less management,less licensing costs,…).
NOTE: the next text also applies to Skype for Business

So for this full redundant and automated solution, the plain idea would be:

  • Pool1 with 3 front-end servers
  • Pool2 with 3 front-end servers
  • 2 Back-end servers, each with:
    • 1 SQL instance for Pool1 (mirror with other server instance)
    • 1 SQL instance for Pool2 (mirror with other server instance)
    • 1 Lync file share storage (DFS between servers)
    • 1 Office WebApp Server (Farm with other server instance)
  • Use existing SQL express on a front-end server of each pool as witness for automatic mirror failover.

You can actually start creating SQL server instances
Topology-limitations1

But when you try to assign one of the instances to a mirror, it will not appear:
Topology-limitations2

The limitation is more visible if you try to create from the ‘New…’ button. Topology builder validates the existence of any existent server FQDN and also prevent the usage of instances named LYNCLOCAL or RTCLOCAL
Topology-limitations3

You will also be prevented to add any other components (Office WebApp, gateways, trunks, edge servers,…)

With this limitations you might think of some Q&A:

  • Question: There’s no technical limitation on the previous design. Is it possible to deploy this solution? Answer: Yes
  • Question: Is it possible to use any existing SQL instance for mirror witness? Answer: Yes
  • Question: Can I deploy Office WebApp components on an existing server in the topology? Answer: Yes
  • Question: In a lab could be possible use only the Front-End servers for a Pool deployment? Answer: It would be radical !, but Y…

The recipe?
If you are not one of those guys that just follow the deployment wizards…
…. you just need to think ‘outside of the box/window’ (and more easy than installing Lync on a Domain controller)😉

To be continued…

Follow

Get every new post delivered to your Inbox.

Join 149 other followers