Part 1: Inside Quest Management Extensions for VMware for OpsMgr 2007

image Since Quest announced they would provide Quest Management Extensions (QMX) for VMware monitoring for FREE (my very  favorite price!) for Operations Manager 2007, I’ve been implementing the solution for a couple of our customers in recent days. Since any third party solution for OpsMgr 2007 is a learning experience your first time out, I wanted to share some of what I’ve learned. In this post, I’ll talk about some of the areas where I had the most questions initially and in future posts we’ll review the solution architecture in greater depth.

  • Data Sources
  • Licensing
  • Setup Instructions
  • Minimizing Alert Noise
  • How to Get Support
  • Scaling Out (for enterprise)
  • Service and Health Model (Classes, Relationships and Health Rollup)
  • Under the Hood (how Quest works beneath the surface
  • Hardware Monitoring for VMware

NOTE:

  • Look for the Our Experience: labels throughout this article where I briefly mention our experience in each area of the installation and configuration process.
  • In the coming days, I’ll also create a proper article for our “ByExample” series, but wanted to get this information to you more quickly.

How QMX for VMware collects data

This QMX extension monitors VMWare Virtual Center servers using 3 different methods. You can mix and match these methods based on your VMWare architecture and to suit your monitoring needs. The 3 collection methods are:

  • VMWare Infrastructure SDK (VI API) for Host and Guests ( see tab Perf_VISDKAPI and job Perf_VISDKAPI in tab __VirtualAgentJobs in QMX console)
  • SNMP (traps) – (see job SNMPTRAP in tab __VirtualAgentJobs in QMX console)
  • SNMP (get to check for availability) (see job SNMPWATCHDOG in tab __VirtualAgentsJobsin QMX console)

NOTE: You disable a method by removing the “Run” checkbox column for the job in tab __VirtualAgentJobs, click the Save button and stop/restart the Virtual Agent.

figure 1 – QMX for VMware solution architecture

Licensing

Since this Quest offering is now free, one of my first questions was “how do I get the free license for my customer?”. This basically requires a simple e-mail request. This information is available on the Quest management-extensions.org website at http://management-extensions.org/entry.jspa?externalID=3794&categoryID=385. Note that only VMware monitoring via vCenter is free. Quest has an offering that probes the hosts directly, which has a cost. However, if you have vCenter, there is little reason to go the direct option.

Our Experience: We sent our first request on a Sunday and had the license Monday morning.

Setup Instructions

Most of the basic setup instructions are actually contained within the QMX Admin Console. Just highlight the VMware node in the QMX Console to see the Table of Contents

image

figure 2 – Instructions TOC within the QMX Console

Our Experience: We missed this the first time out and incorrectly setup some of the direct monitoring. RTFM (Read the Friendly Manual) is always important….so RTFM!

Minimizing Alert Noise (Tuning)

VMware ESX / vSphere and vCenter are enterprise technologies, so tuning is a big part of the implementation process no matter which solution you with. Here are some important tips for minimizing non-actionable alerts from QMX for VMware. In the QMX Console, you will find three tabs that contain settings key to adjusting QMX monitoring to your specifications (shown below).

_Performance_Rollups(VISDK) – This tab includes settings for performance rollups for enterprise components, including vCenter, DataCenter, Cluster and Resource Pool

_Performance_ESXServer(VISDK) – This tab includes settings for host performance metrics including ESX/vSphere Host CPU, Memory, Disk and Network, Host Resource Group CPU, Host Management Agent Memory

_Performance_VM(VISDK) – This tab includes settings guest performance metrics including Guest CPU, Guest Memory, Guest Disk, Guest Network, Guest Resource Group CPU and Uptime, Guest Management Agent Memory. In plain English, this data describes guest performance from a host perspective.

IMPORTANT (pay special attention here)

There are some adjustments that most administrators in mid-size business and small enterprise may want to change to minimize alert noise, including

What you will notice about the settings on each of these tabs is that

  • “Auto Resolve Alert” is NOT SELECTED by default – This means you will have to close the alerts yourself unless you select this checkbox.
  • Aggregate is set to 1 by default – This means alerts will be raised after a single sample breaches the threshold. A number greater than one means that number of consecutive threshold breaches are required to raise the alert, effectively resulting a recurring threshold monitor.
  • Graph is selected by default – This means performance data will be collected and sent to the OperationsManager and Data Warehouse databases

In fact, before changing these defaults we asked support if there was any harm that could come from adjusting these defaults and the answer was “no”. When we asked why the defaults are set as shown above, the answer was this:

auto-resolve is set to false by default because many NOCs have internal processes that require/mandate human intervention on ALL alerts and do not allow auto resolve.

Fair enough. So, as with every management pack, at the very least watch the alerts and do some tuning to eliminate noise alerts.

Our Experience: If you are not a NOC that mandates human intervention, you will definitely need to do some tuning. We chose to enable “Auto  Resolve Alert”  in most areas. We also bumped the Aggregate setting to two or three on most monitors to reduce alerts for transient (brief) spikes in performance. For our customer that doesn’t use reporting or performance graphs a great deal, we also disabled a lot of the performance collection (performance is almost always your biggest contributor to database growth/bloat, so disable what you won’t use. Since it’s all in these three tabs, it was actually much faster than if we had been required to go set a bunch of overrides.

Performance_Rollups_ESXServer

figure 3 – Configuration tabs within the QMX Console

 How to Get Support

You can register for free on the Quest Management Extensions website (http://management-extensions.org/forum.jspa?forumID=723&start=0) and ask questions in the support forums.

Our Experience: We posted questions and received answers the same day. Just look for questions from pzerger in the support forums at http://management-extensions.org/forum.jspa?forumID=723&start=0. I notice that the guys doing most of the answering are solution architects Gary Broadwater and Tony LaMark.

Conclusion and Up Next

I hope you find a few nuggets here that make your experience a good one. In the next post (or posts) we’ll talk about

  • Scaling Out (for enterprise)
  • Service and Health Model (Classes, Relationships and Health Rollup)
  • Under the Hood (how Quest works beneath the surface

    I’d be interested to hear your experiences with QMX for VMware if you have actually implemented. Please share your experiences in comments on this post. I’ll try to supplement this info with more tuning experience as time passes in the article I’m working on for the “ByExample” series.

    You can read about all the available options for VMware monitoring with Operations Manager 2007 in my previous blog post VMware VI3 Monitoring Options for Operations Manager 2007.

 

One thought on “Part 1: Inside Quest Management Extensions for VMware for OpsMgr 2007

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.