Monitor Me! PLEASE!

Ok… the title is a little misleading….

I’m primarily writing about how to make a custom application “Monitor Ready.”  A lot of discussions are spent on how to monitor this or that. It’s probably a good time to share with developers some simple ways to make a custom application ready to be monitored. If you are not a developer but a SCOM admin (most likely), forward this link to your developers.

SCOM monitoring is only (as SCOM admins readily know) as good as the management pack that is created to monitor the application. The management pack is only as good as the application allows it to be.
This blog is an  overview of how to write a “Monitor Ready” application. I’m sure true day to day programmers know the best way to implement my recommendations, but I’m just covering the basics.

Management Packs, Monitors, and Rules

A management pack is a collection of Rules and Monitors. There are other components in management packs but these are the two key objects of interest.

A RULE is a single instruction how to identify a certain condition. This condition can be:

1) An event showing up in an event log
2) A performance counter which exceeds a threshold
3) A script which tests for a condition
4) A WMI query result

A MONITOR is “state based” set of instructions to determine the health of a single condition.

There are 2 state (Healthy, Critical) and 3 state (Healthy, Warning, Critical) monitors in which the conditions for each state are established. For example, a logical disk has a Healthy State (plenty of space free), a Warning State (we are getting low but no imminent danger) and a Critical State (We are either in down situation or coming close). The conditions for Monitors can be:

1) An event showing up in an event log
2) A performance counter which exceeds a threshold
3) A script which tests for a condition
4) A WMI query result

When writing code, here are some common lines of code for error checking:

Try
Do something here
Catch ex As Exception
Ut Oh… it didn’t work
End try

In the CATCH portion of the try is the perfect place to write to the event log. Let’s say your try is to connect to a SQL database and it fails to connection. Write an event to the event log with details in the description as to what went wrong and what the potential corrective action is. These details will show up in the alert.

The following is an example of writing to the event log

Imports System
Imports System.Diagnostics

Module Module1

Sub Main()

Dim sSource As String
Dim sLog As String
Dim sEvent As String
Dim sMachine As String

sSource = “dotNET Sample App
sLog = “Application
sEvent = “Sample Event
sMachine = “.

If Not EventLog.SourceExists(sSource, sMachine) Then
    EventLog.CreateEventSource(sSource, sLog, sMachine)
End If

Dim ELog as new Eventlog(sLog, sMachine, sSource)
ELog.WriteEntry(sEvent)
ELog.WriteEntry(sEvent, EventLogEntryType.Warning, 234, ctype(3,short))

End Sub

End Module

http://support.microsoft.com/kb/301279

 

Once an event is written to the event log, a RULE can be written to watch for that event with [EventID=####, Source=MySource, type= (Error|Warning|Information)]. If this event is found, raise an alert and include the event description in the alert details. If integrated with SCSM, a ticket can be raised with the alert details in the ticket details.

** If you write a combination of events in your Try/Catch, you could use a monitor instead.

Try

Try

Daily Process of Data
Log Event:
EventID 1100,
EventSource=”DailyProcessing”,
EventType=”Information”


Catch ex As Exception

Log Event:
EventID 1101,
EventSource=”DailyProcessing”,
EventType=”Critical”


End try

A monitor can be defined to change to a healthy state for event 1100 and an unhealthy state for event 1101. This is significant because the monitor will close the alert that was raised once it returns to a healthy state. The second benefit is SCOM can report on the Percent UP time (healthy) and percent DOWN time (unhealthy) with out of the box reports.

Scripting Monitors and synthetic transactions

If there is a “scriptable” way to test the UP/DOWN state of your application (using an API via vbscript or powershell), this too can be leveraged for Up/Down state. If you create a web page that when called has text indicating the current health of the application, a Web URL monitor (synthetic transaction) can check for KEY WORDS in the returned HTML to determine the health state (even a web service can be tested in this manner for UP/DOWN state). Synthetic transactions ultimately test a condition as an end user (or consuming service) would.

Performance Counters

Another way to make a “Monitor Ready” application is to create and maintain performance counters.

An example of including performance counters in the application is below. The URL to this explains the form and why to use the different pieces of the code.

‘Declare constants because we want theses
‘same values through all applications
Private Const kCategoryName As String = “EMoreau-DemoApp
Private Const kCounter1Name As String = “Counter1
Private Const kCounter2Name As String = “Counter2

If Not PerformanceCounterCategory.Exists(kCategoryName) Then
CreateCounters()
End If

Sub CreateCounters()
Dim objCounters As New CounterCreationDataCollection
Dim objCounterData As CounterCreationData

‘Create a first counter
objCounterData = New CounterCreationData(kCounter1Name, _
The first counter we have created.“, _
PerformanceCounterType.NumberOfItems32)
objCounters.Add(objCounterData)

‘Create a second counter
objCounterData = New CounterCreationData(kCounter2Name, _
The second counter we have created.“, _
PerformanceCounterType.NumberOfItems32)
objCounters.Add(objCounterData)

‘Create a new category to hold our counters
PerformanceCounterCategory.Create(kCategoryName, _
Demo for Level Extreme .Net magazine“, _
objCounters)

End Sub

‘Increments the counter named “Counter1”
Dim objCounter As New PerformanceCounter(kCategoryName, kCounter1Name, False)
objCounter.Increment()

Dim strCounterName As String
If sender Is btnDecrease1 Then
strCounterName = kCounter1Name
Else
strCounterName = kCounter2Name
End If

Try

Dim objCounter As New PerformanceCounter( kCategoryName, strCounterName, False)
If objCounter.RawValue > 0 Then
objCounter.Decrement()
End If

Catch ex As System.InvalidOperationException
‘write event log here
Catch ex As Exception
‘write event log here
End Try

http://www.universalthread.com/ViewPageArticle.aspx?ID=29

An example of why to use custom performance counters is where the application has a work queue that only the application itself can report on. When the queue exceeds a certain length, you might want an alert to kick off so someone will take a look and see why the backup is happening. On collecting this information, you can very well graph the data in SCOM and see the trend. It is a fantastic way to see your application’s performance and even justify more memory, servers, or the need for code improvements such as multi-threading instead of just having one thread.

An important note on monitors:

If you want to leverage a monitor, make CERTAIN there is a means of consistently checking the health state. If you count on events in the event log (which is not considered the best method but many management packs do use it), you must make sure you can set a healthy state for unexpected conditions.

For example, an application logs an unhealthy event in one procedure that runs once per day. The monitor is now in an unhealthy state and someone reboots the computer. After the reboot, your application does not process again for 20 more hours. The monitor will not know things are healthy until the application runs again. In this case, it may be advantageous to have the application log a healthy event on startup and then resume an unhealthy state (if it still exists) when the procedure runs again.

Always remember, monitors are State-Based. The state ONLY changes when the health state is CHECKED. If the monitor has an infrequent schedule to periodically check the health, the monitor remains in the last known state. If the monitor only checks 1 time per hour and the condition is determined to be unhealthy, it will be reported on as unhealthy for a FULL hour (even if it was only down 1 minute).  Five minute intervals is fairly common.

A good rule of thumb to remember… a RULE checks for a single condition and alerts when found and you must manually close any alerts generated. A monitor continually checks the health state of an object (i.e. logical disk) and closes any alert it originally opened when it is healthy again.

Monitors are great for Up/Down reporting.

If application is not made “Monitor Ready”… SCOM cannot monitor efficiently which is ultimately bad for the end user.

One thought on “Monitor Me! PLEASE!

  1. Pingback: Sertaç Topal » System Center Mart 2015 Bülten

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.