I’ve been working on a monitor which can monitor groups of services as a single object instead of using the service monitor wizard (which creates a lot more in the background and database than I need). As I came up with features I thought it would be great to hear ideas in the general SCOM community who have “heard it all”.
The short description is it will monitor groups of services (NT services or Cluster Resource Services). You can mix services that are clustered in with standard NT services because the Monitor knows the difference (and who owns it) and handles appropriately.
The group of services (as well as the parent) are object instances so views can be used and can be included in distributed applications.
Couple of scenarios (there are overrides for the features):
#1 – NT Services Only – Restart Down Services
if one service is down in the group, restart it (or just display a Critical state with a no restart override). It can be configured to monitor only services that are set to Automatic only with an override.
If the override is enabled, it can change the state to WARNING when all services are restarted so you can notify that services were restarted successfully and then it returns to HEALTHY the next pass. If it could not restart the services, it sets the state to CRITICAL.
#2 – Clustered Resource Services – Restart Offline Resource Service
The monitor checks to see if the resource service in question is hosted on the node where the monitor is running. If it is and one clustered resource service is found offline in the group, bring the resource back online (or just display a CRITICAL state with a no-restart override).
If the override is enabled, it can change the state to WARNING when all cluster resource services are restarted so you can notify that resource services were restarted successfully and then it returns to HEALTHY the next pass. If it could not bring the resources online, it sets the state to CRITICAL.
The health state will only be reflected on the node hosting the resource. If it is on another node, the clustered resource service is ignored.
# 3 – Mixed NT Service and Cluster Resource Service
This functions identical to #1 and #2 other than there is a mixture of NT Services and Resource Services.
#4 – NT Service and Cluster Resource Service with Restart Order
With the “StopServices” override set to true, when one service is found down, it will first stop all of the services in the group. Next it will start the services (or bring online the cluster resource service) in the order these services are listed.
Of course, this holds true for #1, #2, and #3
My interest is in hearing ideas that I might not have thought of or real world monitoring scenarios for monitoring groups of services that have unique requirements.
Multiple Service groups can be monitored on a single server this way.
the overrides are as follows:
True=only monitor if set to auto
True=stop all services and restart in order
True=Change to Warning state if services were restarted
Return to healthy the next pass.
True=Set state to WARNING when 1 or more services
listed does not exist.
The standard polling interval.
My reason for coming up with this is there is a unique situation where I have 3 servers that have 4 services running on them. On 1 server, all 4 are running and on the other two, only two of them are running. At any time, they can manually be changed. The two that are not running are set to Manual which would remove them from monitoring. When set back, they are monitored again.
An unrelated situation (but similar) is where a service “could” be a standard NT Service or “could” be a clustered resource service. This type of monitor covers me either way.
So I’d love to hear thoughts and comments. My first phase test of this monitor was successful and I have been quite happy with it I have only tested with Server 2008 and 2012. the monitor MP is not specific to a SCOM version.