Lync issue: “We could’t reach …”

wecouldtreachDuring the last weeks of a very large deployment project that I’m working, I faced  the following different problems but with a similar pattern.
This customer’s Lync infrastructure has 2 sip domains (let’s just call ch1.com and ch2.com) and the issue was occurring with users from different domains:

  • user@ch1.com tried to make (was delegated permissions to) call on behalf of user@ch2.com
  • user@ch1.com fail to join audio PSTN (Call me at number…) if the conference organizer was user@ch2.com
    Callme
  • User agents fail to make PSTN calls on behalf of an response group with a different SIP Domain (this one referenced in a post by Joachim Dissing)

In all cases the user received the same error message “We couldn’t reach…”.
A look on the client Lync-Uccapi-0.uccapilog file revealed an INFO message with a surprising message
callmeissue-uccapilog“User does not exist”, source = >edge server> (?!).
Time to trace the outboundrouting on the Front End Servers, which reveal also something peculiar:

TL_VERBOSE... (OutboundRouting,OutboundRoutingTransaction.constructor:outboundroutingtransaction.cs(382))Creating a OutboundRoutingTransaction object (3)
TL_VERBOSE... (OutboundRouting,OutboundRouting.OnRequest:outboundrouting.cs(289))[3719388760]Enter
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.ProcessIncomingRequestHeaders:outboundroutingdispatcher.cs(1520))[3719388760]Enter
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.ProcessIncomingRequestHeaders:outboundroutingdispatcher.cs(1552))[3719388760]Referred-by header found: <sip:test.user01b@ch3.com>;ms-identity="MIIBwgYJKoZIhvc....hg==:Wed, 24 Sep 2014 12:24:33 GMT";ms-identity-info="sip:pool01.ch2.com:5063;ms-fe=frontend08.ch2.com;transport=Tls";ms-identity-alg=rsa-sha1
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.ProcessIncomingRequestHeaders:outboundroutingdispatcher.cs(1678))[3719388760]Exit
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(242))[3719388760]From uri: sip:test.user01a@ch2.com
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(243))[3719388760]From User Uc Enabled: True
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(246))[3719388760]ReferrerUri: sip:test.user01b@ch3.com
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(247))[3719388760]Referrer Uc Enabled: True
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(248))[3719388760]Referrer is Local: True
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(249))[3719388760]Referrer is inCurrentDeployment: False
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(250))[3719388760]Referrer DeploymentLocator: NULL
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(256))[3719388760]IsAvMCUDialOut: True
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.OnRequest:outboundroutingdispatcher.cs(257))[3719388760]Alternate Tel URI: tel:+80001
TL_VERBOSE... (OutboundRouting,EmergencyCallHelper.IsEmergencyCall:emergencycallhelper.cs(38))IsEmergencyCall = False
...
TL_NOISE... (OutboundRouting,Settings.GetMatchingVacantNumberEntry:settings.cs(323))Checking for Vacant number entries for +90000
...
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.CheckAndRouteVacantNumberRange:outboundroutingdispatcher.cs(1838))[3719388760]+90000 does not match any Vacant Number range
...
TL_NOISE... (OutboundRouting,Settings.GetMatchingCPSEntry:settings.cs(291))Checking for CPS entry for +90000
...
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.CheckAndRouteCallParkService:outboundroutingdispatcher.cs(1165))[3719388760]+90000 does not match any CPS range
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.HasAvoidPSTNTollByPassPolicy:outboundroutingdispatcher.cs(2577))[3719388760]AvoidTollByPassUsage not found.
...
TL_VERBOSE... (OutboundRouting,OutboundRoutingDispatcher.IsAnonymousUser:outboundroutingdispatcher.cs(3221))[3719388760]user URI is not anonymous
TL_VERBOSE... (OutboundRouting,RoutingHelper.AddHeaderIfNotAlreadyPresent:routinghelper.cs(169))[3719388760]Adding ms-conference header with value true
TL_VERBOSE... (OutboundRouting,RoutingHelper.AddHeaderIfNotAlreadyPresent:routinghelper.cs(185))[3719388760]Adding MSReferrerPhone header with value <sip:+80002@ch3.com;user=phone>
TL_INFO(TF_PROTOCOL) (OutboundRouting,OutboundRoutingDispatcher.ProcessOutboundRequestToPstn:outboundroutingdispatcher.cs(1401))[3719388760]Target Domain and caller domain not same, sending request on its way...
...
TL_VERBOSE... (OutboundRouting,ClusterPingEngine.TimerCallback:clusterpingengine.cs(221))(0000000000D28422)Enter TimerCallback
TL_VERBOSE... (OutboundRouting,ClusterPingEngine.GetQueuedEvents:clusterpingengine.cs(286))(0000000000D28422)No events to process
...
TL_VERBOSE... (OutboundRouting,ClusterPingEngine.ScheduleTimer:clusterpingengine.cs(349))(0000000000D28422)Exit

A supposed PSTN call was not possible because the SIP Domains were diferent ?!
After some more ‘reverse engeneering’ the outboundrouting process (users with the same domain will  process the call to the mediation server) I suspect that  some code validation was not doing the right Thing.

Time to open a MS Support ticket and in less than 24 hours I got the answer:
The failure is due to new design change in 2013 Outbound Routing. Product group will publish a server side hotfix within 2 weeks to fix the following issue:
“Target Domain and caller domain not same, sending request on its way…” In this case, caller’s domain and target domain do not match. In the hotfix we will stop doing the domain check.

Less than 2 weeks, Microsoft released Lync server Cumulative Update (CU6)  but didn’t find an exact reference to my issue except the KB2995173.
Next step: download CU6, apply on the pre-production environment and check:

TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.constructor:outboundroutingtransaction.cs(382))Creating a OutboundRoutingTransaction object (3)
TL_VERBOSE (OutboundRouting,OutboundRouting.OnRequest:outboundrouting.cs(289))[2359519239]Enter
TL_VERBOSE (OutboundRouting,EmergencyCallHelper.IsEmergencyCall:emergencycallhelper.cs(38))IsEmergencyCall = False
TL_NOISE (OutboundRouting,Settings.GetMatchingVacantNumberEntry:settings.cs(323))Checking for Vacant number entries for +90000
TL_VERBOSE (OutboundRouting,NumberRangeListT<>.GetMatchingRange:numberrangelist.cs(235))(00000000026B0DA2)No matching range found.
TL_NOISE (OutboundRouting,Settings.GetMatchingVacantNumberEntry:settings.cs(363))No matching Vacant Number Range found.
TL_NOISE (OutboundRouting,Settings.GetMatchingCPSEntry:settings.cs(291))Checking for CPS entry for +90000
TL_VERBOSE (OutboundRouting,NumberRangeListT<>.GetMatchingRange:numberrangelist.cs(235))(00000000003C60F1)No matching range found.
TL_NOISE (OutboundRouting,Settings.GetMatchingCPSEntry:settings.cs(316))No matching range found
TL_VERBOSE (OutboundRouting,RoutingHelper.AddHeaderIfNotAlreadyPresent:routinghelper.cs(169))[2359519239]Adding ms-conference header with value true
TL_VERBOSE (OutboundRouting,RoutingHelper.AddHeaderIfNotAlreadyPresent:routinghelper.cs(185))[2359519239]Adding MSReferrerPhone header with value <sip:+80002@ch3.com;user=phone>
TL_VERBOSE (OutboundRouting,RoutingHelper.AddHeaderIfNotAlreadyPresent:routinghelper.cs(169))[2359519239]Adding ms-privacy header with value id
TL_VERBOSE (OutboundRouting,RoutingHelper.RemoveHeaderIfPresent:routinghelper.cs(202))[2359519239]ms-vm-escape-timer header not present request.
TL_VERBOSE (OutboundRouting,EmergencyCallHelper.IsEmergencyCall:emergencycallhelper.cs(38))IsEmergencyCall = False
TL_VERBOSE (OutboundRouting,PhoneRouter.GetRoutes:phonerouter.cs(146))(0000000001F30767)target phone number: +90000, PhoneRouteUsage: chz
TL_VERBOSE (OutboundRouting,PhoneRouter.GetRoutes:phonerouter.cs(215))(0000000001F30767)#hits: 1, route names: CHZ 
TL_VERBOSE (OutboundRouting,PhoneRouter.GetRoutes:phonerouter.cs(218))(0000000001F30767)Exit
TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.SetRouteInformation:outboundroutingtransaction.cs(770))[2359519239]Target is not LBR Restricted; If target has an associated site id then target is a PBX and we may need to enforce PSTN toll bypass rules.
TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.SetRouteInformation:outboundroutingtransaction.cs(788))[2359519239]Target does not have a site assigned to it.  No need to enforce PSTN toll bypass rules.
TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.GetRoutes:outboundroutingtransaction.cs(589))[2359519239]Forking disabled since there are no backup gateways or routes.
TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.GetNextHop:outboundroutingtransaction.cs(890))[2359519239]Attempting to find GetNextHop: Length:1; StartAt:0; Tried:1; Name/State:chze33vd-Up
TL_VERBOSE (OutboundRouting,OutboundGateway.ApplyRulesInList:gateway.cs(475))(0000000001220146)No outbound translation rules defined for Target chze33vd
TL_VERBOSE (OutboundRouting,OutboundGateway.GetTranslatedCallerId:gateway.cs(408))(0000000001220146)No outbound Calling number translation rules defined for target chze33vd
TL_VERBOSE OutboundRouting,OutboundGateway.UpdateCallerId:gateway.cs(381))[2359519239]Updated Caller Id (new PAI =<tel:+80001>)
TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.SetState:outboundroutingtransaction.cs(445))[2359519239]Set transaction 3 state to ClientTimerRunning
TL_VERBOSE (OutboundRouting,ORTransactionTimer.AddPendingTransaction:ortransactiontimer.cs(213))(000000000004DFF6)Enter
TL_VERBOSE (OutboundRouting,ORTransactionTimer.AddPendingTransaction:ortransactiontimer.cs(242))(000000000004DFF6)Global timer started.
TL_VERBOSE (OutboundRouting,ORTransactionTimer.AddPendingTransaction:ortransactiontimer.cs(249))(000000000004DFF6)Exit
TL_VERBOSE (OutboundRouting,OutboundRouting.OnRequest:outboundrouting.cs(292))[2359519239]Exit
TL_VERBOSE (OutboundRouting,OutboundRoutingTransaction.SetState:outboundroutingtransaction.cs(445))[2359519239]Set transaction 3 state to ClientTerminated
TL_VERBOSE (OutboundRouting,ORTransactionTimer.RemovePendingTransaction:ortransactiontimer.cs(255))(000000000004DFF6)Enter
TL_VERBOSE (OutboundRouting,ORTransactionTimer.RemoveListEntry:ortransactiontimer.cs(190))(000000000004DFF6)Remove list entry. Id: 3
TL_VERBOSE (OutboundRouting,ORTransactionTimer.RemovePendingTransaction:ortransactiontimer.cs(262))(000000000004DFF6)Global timer stopped.
TL_VERBOSE (OutboundRouting,ORTransactionTimer.RemovePendingTransaction:ortransactiontimer.cs(267))(000000000004DFF6)Exit
...

The routing engine looks much different now but… voila! it works.
Time to plan an CU6 upgrade to 6 pools, 18 Front End Servers, 8 Edge Servers, … 🙂

To finish this post an explanation about the suspicious ‘user not found’ message. It turns out that since the outboundrouting dispatch the ‘call’ back to the Lync engine, the next step was to forward the invite to the edge since it was dialog between different domains. The  Edge Servers will simply return back because they check that the destination domain was… internal (smart guys)… didn’t o much deeper to understand who put the message received by the Lync Client.

Comments welcome