ReSearch This KB – Max Concurrent API Reached alert

Alert: Max Concurrent API Reached alert

Management Pack Name:  Windows Server 2008 Operating System (Monitoring), Windows Server 2012 Operating System (Monitoring)

Management Pack Version: 6.0.7026.0, 6.0.7026.0

Rule or Monitor: Monitor

Rule or Monitor Name: Windows Server 2008 Max Concurrent API Monitor, Windows Server 2012 Max Concurrent API Monitor

Rule or Monitor Notes: None

Issue: This alert indicates that there is an issue with the max concurrent connections on the server which is being monitored. However if the information from the alert and attached KB’s are correct this monitor is alerting on non-relevant conditions.

According to KB 2688798, the formula which indicates the potential to set a registry key value on this server is:

(semaphore_acquires + semaphore_time-outs) * average_semaphore_hold_time / time_collection_length = < New_MaxConcurrentApi_setting

If we log into the server identified in the alert and open Performance Monitor we can add all of the NetLogon performance counters which are required for this calculation as shown below:

image

The counter for Semaphore Acquires continues to grow over time as shown below:

image

For this example, I ran the performance collection for one hour and 40 minutes. A little after the hour and 40 minutes had passed we see the values below:

image

The example in the KB article listed as follows:

“The average semaphore hold time can be determined by changing the default view from Line View to Report View in Perfmon.msc. For example, consider the following scenario:

  • The semaphore acquires value is 8,286.
  • The semaphore time-outs value is 883.
  • The average semaphore hold time is .5 (that is, a half second).
  • The duration of reporting is 90 seconds.

In this scenario, the formula would be as follows:

(8,286 + 883) * .5 / 90 =< 51”

 

Based on the KB article referenced above, these were the values that I saw from monitoring the server compared to the sample values in the KB:

  • The semaphore acquires value is 280 – 76 (the highest value minus the lowest value).
  • The semaphore time-outs value is 0.
  • The average semaphore hold time is 0 (that is, a half second).
  • The duration of reporting is 140 minutes (8400 seconds).

In this scenario, the formula would be as follows:

(204 + 0) * 0 / 8400 =< 0

If we look on the alert context tab for the alert, we can see the same values that we are looking for are collected by Operations Manager as shown below (with the possible exception of the duration of reporting):

image

  • The semaphore acquires value is 15653.
  • The semaphore time-outs value is 0.
  • The average semaphore hold time is 0 (that is, a half second).
  • The duration of reporting is 0 seconds.

In this scenario, the formula would be as follows:

(15653 + 0) * 0 / 0 =< 0

We can break these values into a shorter form as shown below:

 

(semaphore_acquires + semaphore_time-outs) * average_semaphore_hold_time / time_collection_length = < New_MaxConcurrentApi_setting

(SA + STO) * SAVG / TIME =< New_MaxConcurrentAPI_setting

 

And we can mark the appropriate values gathered by Operations Manager and the importance of a zero value below.

image

SA always appears to have a non-zero value.

STO seems to stay at a zero value in all examples I have seen for this alert.

SAVG seems to stay at a zero value in all examples I have seen for this alert.

 

If the Average Semaphore Hold Time value = 0, the equation will result in zero.  This means that the alert should not be created to indicate that this value should be changed via altering a registry key. IE: This alert is misdiagnosing this as an issue when it does not appear to actually be an issue.

From a Windows Server 2008 R2 system in error we see that there is what appears to be invalid information in the “Average Semaphore Hold Time” value below.

image

From a Windows Server 2012 system we see a zero value for this which will result in a zero value for the results of the calculation.

image

 

Additional discussion on this topic is available at:

http://social.technet.microsoft.com/Forums/en-US/operationsmanagergeneral/thread/cb0c57d1-cdfd-445d-a96e-67683e6abdb3

http://support.microsoft.com/kb/2688798

http://social.technet.microsoft.com/wiki/contents/articles/9759.configuring-maxconcurrentapi-for-ntlm-pass-through-authentication.aspx

Resolution: Created an override to disable both of these monitors as they do not appear to work (per Alexey Zhuravlev recommendation as well discussed in the social.technet.microsoft.com link above). Hopefully these monitors will be updated in a version later than 6.0.7026.0 of the management packs.

 

UPDATE: Alexey Zhuravlev has spent some significant time on this alert and has provided me with three important updates on this topic. I’m adding his comments below directly to this KB article as they were extremely detailed and insightful.

Update #1:

I am reading your KB article (http://www.systemcentercentral.com/research-this-kb-max-concurrent-api-reached-alert/) and want to add my 2c.

First of all, those monitors have a confusing knowledge article. It refers to the http://support.microsoft.com/kb/2688798

But formulas used in those monitors are different from the formula in MS KB article.

This monitor uses the simple formula:

((($SemWaiters -gt 0) -and (-not($SemWaiters -gt 4GB))) -or

(($SemHolders -gt 0) -and (-not($SemHolders -gt 4GB))) -or

(($SemTimeouts -gt 0) -and (-not($SemTimeouts -gt 4GB)))

4GB is the filter for a buffer overrun documented here: http://support.microsoft.com/kb/2685888/en-us

Second, this monitor will not work. Just because its developer passes ‘true’ or ‘false’ as the perf counter instance name.

 

Also from your KB:

“From a Windows Server 2008 R2 system in error we see that there is what appears to be invalid information in the “Average Semaphore Hold Time” value below.”

On the screen dump I see the ‘Category does not exist’. This could mean two things:

  • We’re running the monitor on a workgroup machine. There is no Netlogon counters on workgroup machines, so the error is obvious.
  • We’re have a domain joined machine with the perf counters problems. It has nothing to do with the monitors and should be fixed.

FYI:

I’ve made a hotfix MP for those monitors. I fixed the script, now it passes the instance name to the function instead of a Boolean. And it will always return the Healthy state on a workgroup computer.

http://gallery.technet.microsoft.com/Hotfix-Management-Pack-278d25ef

P.S. “In this scenario, the formula would be as follows: (15653 + 0) * 0 / 0 =< 0”

Division by zero? I think that’s the reason the monitor is using the simplified formula. I hope this information helps.

 

Update #2:

“From a Windows Server 2008 R2 system in error we see that there is what appears to be invalid information in the “Average Semaphore Hold Time” value below.”

On the screen dump I see the ‘Category does not exist’. This could mean two things:

  • We’re running the monitor on a workgroup machine. There is no Netlogon counters on workgroup machines, so the error is obvious.
  • We’re have a domain joined machine with the perf counters problems. It has nothing to do with the monitors and should be fixed.»

Is only for the fixed monitor. For the original monitor this is how the bug works. Just to illustrate why the error is happen:

The monitor in question contains the following logic:

       if($DiagnosticMode -eq 0)

      {

         $SCNames = $SCNames -match ‘_Total’

       }

Diagnostic Mode for a monitor is always defaults to 0. Then this $SCNames is passed to the counter probe block:

$InformationCollected = MCAProblemCheckLite ($SCNames[$i]) 

function MCAProblemCheckLite

{

        param($InstanceName) 

           #Returns PS Object with performance data related to the Netlogon being used to determine if Max Concurrent API Scenario is present. (Used by CSS)

              $DetectionTime = Get-Date

              $ProblemDetected = $False

              $SA = New-Object System.Diagnostics.PerformanceCounter(“Netlogon”, “Semaphore Acquires”, $InstanceName)

 

The issue is the –match operator will ALWAYS return the Boolean value (false\true) not filter the array. That is why the System.Diagnostics.PerformanceCounter will return ‘Category does not exist’ (that means just ‘cannot find the instance ‘true’ in your counters’).

When you are using my monitor you can get the same error (Category does not exist) and now you should investigate this. My code block is:

#Check if this computer is joined to Active Directory domain. Workgroup members doesn’t have a Netlogon counters so this script has nothing to do with those machines

                $DomainJoined = (gwmi win32_computersystem).partofdomain

                If ($DomainJoined -eq $True) {

                if($DiagnosticMode -eq 0)

                {

                $SCNames = @(‘_Total’)

                }

                else

                {

                $SCNames = GetSecureChannelNames

                }

Update #3:

It  worth to mention this post in your article as well. It looks like MP devs took the script from this guy and cut away code blocks they are considered as not required:

http://blogs.technet.com/b/ad/archive/2013/04/02/easy-checking-for-maxconcurrentapi-problems.aspx

Leave a Reply