Digging deeper into the Exchange 2007 MP: Tuning the Failure DSNs total alert

When working through some Exchange 2007 alerts, I ran across the “Failure DSNs total” alert. I’ve seen this in previous versions of the management pack and the indications were that since this was a continually increasing value it did not provide a relevant result for a monitor. To verify that this performance counter continues to increase I first checked the weekly performance data: (I have to point out in the screenshot below how the performance view can show maintenance mode windows so it’s easy in this case to see when the last reboot occurred)

image

And then the performance over history with a report:

image

These confirmed that this value continues to increase until the service is restarted (or the machine is rebooted). Historically this would mean it would be logical to either disable this monitor until the performance counter is corrected or to lower the priority to a warning until the performance counter can be changed to not continually increase. However, digging online I found that this monitor actually works off a delta in values not the actual value! (http://msmvps.com/blogs/ehlo/archive/2008/06/23/1636921.aspx). This makes the monitor useable. However since we were receiving a large number of alerts on these and they were not causing issues in the environment myself and the Exchange engineer decided that the next step would be to determine effective thresholds for my environment. With any non-delta counters we can normally right-click on the alert, and open the performance view and see what the range of values is for the condition. However, in the case of a monitor based upon the delta we needed to use the state change events for the monitor within Health Explorer. This allows us to identify what the delta value are so that we could establish a baseline for this environment (in the screenshot below the delta value shows a 44).

image

For our environment, we decided upon a first threshold (warning level) of 50 and a second threshold (critical level) of 75. This has worked well to reduce the amount of alerts generated by this monitor while still maintaining it’s functionality.

image 

Summary: For the “Failure DNSs total” if you are receiving large numbers of alerts use the State Change Events for the monitor to determine what threshold levels are optimal for your Exchange environment.

Leave a Reply