Written By: Pete Zerger [MVP] and John Joyner [MVP] ; edited by Kerrie Meyler [MVP]
The Microsoft Exchange 2010 Management Pack includes a change in direction in terms of management pack design that bears some explanation before we begin discussion of management pack installation, configuration and tuning. Whether or not you agree with the change in strategy, one has to admire the progressive thinking behind what will be perceived by some as a radical shift in management pack design.
Significant Changes in Management Pack Design
The Exchange Server 2010 management pack includes a new Windows service component called the correlation engine. This service determines the best alert to raise by examining the Exchange 2010 health model through the Operations Manager SDK service. In short, this service maintains the Exchange health model in memory, reviews the faults for instances within the Health Model over the last 90 seconds and raises an alert for the fault lowest in the health model. The end result is fewer false alerts, as the correlation engine attempts to raise an alert for the source cause of the service impact or outage.
Figure 1 – The Exchange 2010 MP Correlation Engine
Things to Keep in Mind when working with the Correlation Engine
Disabling monitors of any kind changes how the health model behaves. This will interfere with the correlation engine in its attempt to identify source cause. Microsoft recommends against disabling monitors in the Exchange 2010 Management Pack for this reason.
Since the correlation engine communicates with the SDK, Microsoft recommends installing the correlation engine on the RMS. Because the correlation engine only examines the last 90 seconds of health state related activity, this should hopefully limit resource utilization on the RMS. However, we have no empirical data (hard numbers) from the field regarding scalability to share at this time.
NOTE: If your root management server (RMS) is clustered, the correlation engine can be clustered just like any other Windows service. While this is not mentioned in the Exchange 2010 Management Pack Guide, we understand Microsoft intends to add guidance for this scenario in the next version of the guide.
Why so many classes?
The service model (classes and relationships) of the Exchange 2010 management pack is much more detailed, consisting of more than 200 Exchange-related classes. MP authoring expert Brian Wren advises that classes are useful only when they have a unique purpose in monitoring. In this case, that unique purpose is granular targeting and root cause identification of Exchange failures with a higher degree of precision. This precision is facilitated by the very granular service model. In our opinion, the added complexity in this scenario is good for root cause analysis and mean time to recovery (MTTR) for service availability in general.
For example, in lab testing, when we dismounted the Exchange database the correlation engine only generated one alert — database down. There are many faults as a result of dismounting this database, including but not limited to failures in synthetic transaction monitoring. The fact that the correlation engine correctly identified the root cause is a good sign. Of course this lab environment is no doubt much simpler in design in your production Exchange organization. Therefore the standard warranty “your mileage may vary” applies here.
Another new concept is that of alert classification. There are three possible classifications:
- Key Health Indicator (KHI) – Represent critical issues ß most alerts fall into this category.
- Non Service Impacting (NSI) – This are critical alerts for which impact is generally only one object or group of users or objects. These are issues that affect a subset of users, but not the entire organization. As an example, a duplicate Exchange alias would affect mail delivery for the two users with the same alias, but org-wide delivery would remain unaffected.
- Forensic – Items in this category represent diagnostics that may not represent a specific problem, but may be helpful while troubleshooting an issue.
Let’s get down to the business of management pack installation and configuration.
How to Install the Exchange 2010 MP
It is important to address the critical prerequisites before importing the management pack and beginning configuration. There are several items of great importance before beginning to install the Exchange 2010 management pack.
Install the update specified in Microsoft Knowledge Base article 971541 (Cumulative Update 1 to OpsMgr 2007) if you are running Operations Manager 2007 SP1. Install the update specified in Microsoft Knowledge Base article 974144 (Cumulative Update 1 to OpsMgr 2007 R2) if you are running Operations Manager 2007 R2. These updates resolve several critical issues that are more likely to occur when running the Exchange 2010 management pack. These updates resolve serious issues affecting state rollup using dependency monitors. These updates allow the Exchange Server 2010 management pack to accurately monitor whether Exchange databases are mounted. Failing to install the respective update on your RMS and all agent computers will also result in inaccurate availability reporting.
The Exchange Correlation service requires the following:
- It must be installed on a computer running Windows Server 2003, or either the 32-bit or 64-bit version of Windows Server 2008 or Windows Server 2008 R2.
- It must have network connectivity to the Operations Manager RMS.
- The Operations Manager Administration Tools must be installed on the computer running the Exchange Correlation service.
When you extract the Exchange 2010 management pack (available HERE), the extraction wizard expects to do more than simply extract the sealed management pack files and accompanying documentation and to disk. The wizard also aims to install the correlation engine on the local machine where the installer was launched, as indicated in Figures 2 and 3 below. With that in mind, launch the MSI package on the system intended to install the correlation engine.
Figure 2 – MP Extraction and Correlation Installation Directories
As you see in Figure 3….”extraction” is really MP extraction and correlation installation.
Figure 3 – Correlation Engine Service Installation
And the resulting service appears in the Windows Service Manager.
Figure 4 – MS Exchange Monitoring Correlation Engine Windows Service
Once the correlation engine is installed and the management pack files extracted, you can import the Exchange 2010 management pack just as you would any other.
You should create an unsealed management pack dedicated to the storage of overrides and other customizations for the Exchange 2010 management pack, just as you would for any other management pack.
NOTE: When you attempt to import the MP, you will notice each has the warning symbol with a lock over it. As the warning message explains, this MP contains rules which have write actions. You can safely ignore this, as it is understood this is the case.
How to Configure the Exchange 2010 MP
There is surprisingly little to configure in the Exchange 2010 management pack, however there are several areas that deserve additional description. These are discussed next.
High Availability Exchange Architectures
It is worth mentioning here how the Exchange 2010 management pack discovers and monitors high-availability exchange of the structures. Most significantly, the Windows cluster management pack is not required. In fact, Microsoft recommends that discovery of cluster components in your Exchange infrastructure be disabled, is this information is not utilized for discovery or monitoring of high-availability Exchange configurations. This is largely due to the fact that the designers of the Exchange 2010 management pack are leveraging built-in functionality of OpsMgr and Exchange to accurately identify high-availability configurations, without depending on other management packs.
Note: this does not mean you should uninstall the Windows cluster management pack. It simply means you should disable the root discoveries of the cluster management pack for all Exchange servers.
Configuring Synthetic Transaction Monitoring
The Exchange 2010 management pack includes protocol level synthetic transaction monitoring that requires little configuration. In fact, the management pack guide in this release did not mention any required configuration for successful synthetic transaction monitoring. It was only after an alert was raised that we realized some configuration was required. (We also understand this oversight will be addressed in the next revision of the guide accompanying this management pack.)
Configuring synthetic transaction monitoring requires running the New-TestCASConnectivityUser.ps1 PowerShell script from the Exchange 2010 Command Shell. You will be prompted for a one time password. From that point on, the Exchange 2010 MP will handle the process for you.
Remember that the user running the New-TestCASConnectivityUser cmdlet needs to be a member of the Organization Management security group in Active Directory, which is found in the Microsoft Exchange Security Groups OU.
NOTE: Synthetic transaction monitoring requires the agent running as Local System. Anything else will cause synthetic transaction monitoring to fail.
The next sections discuss some optional configurations.
Enabling Event Collection for Synthetic Transaction Rules
The Exchange 2010 management pack uses synthetic transactions, such as the running of the Test-MapiConnectivity, Test-OwaConnectivity, and other commands, to scan your Exchange organization for basic connection responses and to test simple operations such as logging in to a mailbox. Whether these tests succeed or fail, their output is useful for investigating the state of the Exchange environment. However, since there is a significant amount of output for each task, the event output is not stored by default.
NOTE: Event collection increases database utilization. Do not enable unless you want a closer look at what’s happening behind the scenes, or you have some troubleshooting to do. Make use of the Top Generators reports in the OpsMgr 2007 R2 Core Monitoring MP. Read more on these reports HERE.
If you need to enable the event collection rules for synthetic transaction output, perform the following steps:
- In the Operations console, click Authoring.
- In the Authoring pane, expand Management Pack Objects, and then click Rules.
- In the Rules pane, click Change Scope.
- In the Scope Management Pack Target(s) by object dialog box, in the Look for box, type “Exchange Server 2010.”
- Click View all targets.
- Click Select All if it’s not disabled (it is only disabled when all rows are already selected).
- Click OK to close the dialog box.
- After the rules have loaded, type “Script event collection” in the Look for box near the top of the console.
- For each test task you would like to enable, perform the following steps:
- Right-click on the rule and select Overrides > Override the Rule > For all objects of class: [class name].
- Click the Override checkbox.
- Set the override value to True.
- Click OK.
NOTE: It will take some time for the overrides to be picked up by the agents and for events to appear in the built-in views.
Tuning / Alerts to look for in the Exchange 2010 MP
In this section, we need to address not only alert tuning, but alert tuning in the context of the correlation engine. There are a few things to be aware of before you start creating overrides. While the correlation engine evaluates faults in the context of the health model, it is not a black box. You can gain some insight into what faults the engine has evaluated in order to determine which alert to raise in the Operations console.
The following are some tuning guidelines to consider with regard to the management pack.
Disabling Monitoring - As mentioned in the first section of this discussion, wholesale disabling of monitors is discouraged by Microsoft. If you disable a portion of the state monitoring within the health model, you are eliminating a portion of the input the engine is expecting to reach an appropriate determination of root cause. Particularly, disabling monitors in the lower levels of the health model for services and components in use in your environment (database, services, etc.) could potentially disrupt the process.
For example, if you disable the monitor that watches to ensure the Exchange databases are mounted, root cause analysis would point to an incorrect root cause when a database is offline. However, if there are some alerts raised for components not actively utilized in your organization, you may well want to disable these monitors to eliminate alerts that are not actionable. For example, if you have databases that have been created but are not yet in use and mounted, you would likely want to disable monitoring for those database instances.
Additionally, because the Exchange 2010 management pack is designed with Exchange 2010 best practices in mind, there may special cases where Exchange has been deployed for a special purpose or perhaps in some unusual configuration, and some tuning and testing of non-standard scenarios will be required to ensure an accurate result.
Threshold Overrides - On the other hand, tuning of monitoring thresholds is a normal part of the tuning process. Upon closer examination, you may find thresholds of unit, aggregate rollup and dependency rollup monitors may need to be adjusted for your organizations specific configuration. However, the thresholds are designed to suit the needs of most organizations out-of-the-box.
In particular, it seems probable that some organizations may want to adjust the rollup percentages on the dependency rollup monitors use to gauge the health of distributed components and high-availability configurations. No doubt hardware sizing and load distribution introduce some environmentally specific variables that may require fine-tuning in some environments. With far fewer alerts raised as a result of the correlation engine, there are now definitely a much smaller number of false alerts that require tuning. There were a small number of alerts of which you should be aware which are described here:
Alert: The test mailbox was not initialized. Run new-TestCasConnectivityUser.ps1 to ensure that the test mailbox is created.
Issue: There is a bit of documentation currently missing from the guide related to configuration of synthetic transaction monitoring.
Alert: The required SCOM hotfixes for Exchange MP are not installed.
Issue: While it is good that an alert is raised for Exchange servers without the requisite hotfixes, the problem lies in the targeting. This monitor targets the Health Service class, which means it can potentially raise alerts for EVERY agent in your environment, regardless of what server applications they are running.
Resolution: Create a group containing the health service instances of Exchange 2010 servers. Disable the monitor for the Health Service class. Re-enable for the group of Exchange 2010 Health Service instances.
Alert: IMAP4 [and] POP3 connectivity transaction failure.
Issue: If you have not deployed those protocols in your organization, there is unnecessary alerting.
Resolution: Override the IMAP4 and POP3 connectivity transaction failure alerts on CAS servers where these services are not running.
Discovery Overrides – In a large environments where OpsMgr administrators are seeking to minimize the number of discoveries across all management packs, we noticed two Exchange 2010 discoveries to consider de-tuning in terms of the frequency of the discovery. These are the Microsoft.Exchange.2010.Mailbox.MdbOwningServerLocalEntityDiscoveryRule and the Microsoft.Exchange.2010.Mailbox.MdbOwningServerRemoteEntityDiscoveryRule. The default discovery of every 14,400 seconds (every 4 hours) might be extended by override to every 86,400 seconds (once per day).
Insight into the Correlation Engine
Because the correlation engine is itself a black box of sorts, a natural human response is to ask what exactly that engine is up to and how it works under the Hood. Questions we heard quite a lot at MMS 2010 and in the days since include “Why should I trust the correlation engine?” and “What faults did the correlation engine evaluate in raising this alert?”
These are both fair questions, so let’s consider them individually.
Why should I trust the correlation engine?
Microsoft might say “because we’ve been using this management pack to monitor the largest Exchange environments in the world for the last two years.” While that does provide some comfort, it does not alleviate the perception of the correlation engine as a “hands off” process. However, you do have some ability to review a fault interpreted by the correlation engine in raising an alert for the root cause of a given incident.
What faults did the correlation engine evaluate in raising this alert?
You can gain some insight to the correlation engine from within the Operations console. In the properties of an alert, you’ll find an AlertContext tab. On this tab, you find some XML that reveals what fault the correlation engine has determined to be root cause versus those considered to be related issues — faults that were raised as a result of the fault considered to be root cause. The fault within the <RootCause></RootCause> tags is the one determined by the correlation engine to be the root cause of the incident, and the faults within the <RelatedIssues></RelatedIssues> tags are those determined by the correlation engine to have been raised as a result of the root cause. While this information is only available after the fact, it still provides some visibility into the data taking into consideration by the correlation engine before raising the alert for the cause of a given incident.
Figure 5 – AlertContext Tab of Alert Raised by Correlation Engine
Figure 6 – Root Cause (within AlertContext tab)
Figure 7 – Related Issues (within AlertContext tab)
Certainly the discussion doesn’t end here. The introduction of the correlation engine raises questions as to what this portends for future releases of other management packs for large distributed applications, such as Active Directory and SharePoint. While the correlation engine may be a positive development in the quest for speedy root cause analysis, it raises concerns about the rise of multiple Windows-based engines that perform similar functions for other management packs. At what point does this correlation move from stand-alone services into the core product itself? Perhaps some light will be shed on this as Microsoft’s plans for Operations Manager vNext are revealed.
The Microsoft Exchange 2010 Management Pack brings some new thinking to the table. What do you think of these changes? What has your experience been with the Exchange 2010 management pack? We are eager to hear your thoughts, which you may leave a comment on this post. Questions, comments and any feedback on Exchange alerts or tuning guidance that you found helpful in your environment are welcome and appreciated.
Special thanks to Alexandre Verkinderen [MVP] for reviewing this article.