OpsMgr R2 by Example: The Operations Manager Management Pack

Written By: Cameron Fuller [MVP]

The Operations Manager R2 management pack is automatically installed when you install System Center Operations Manager.

How to Install the Operations Manager MP

  1. Download the Operations Manager Management Pack management pack guide from the Management Pack Catalog (http://technet.microsoft.com/en-us/opsmgr/cc539535.aspx) and is the “Operations Manager 2007 R2 Management Pack Guide.doc”
  2. Read the Management Pack guide as this covers items such as how to enable recovery for Health Service Heartbeats and creating Run As accounts.
  3. Create an OperationsManager_Overrides management pack to contain any overrides required for the MP.

Operations Manager MP Tuning / Alerts to look for

The following alerts were encountered and resolved while tuning the Operations Manager management pack (these are listed in alphabetical order by Alert name):

Alert: AD Agent Assignment: Admins User Role needs at least one domain account

Issue: AD integrated agent deployment was not functional. The OpsMgr service was installed and running on the agent but would not show up in the pending management folder.

Resolution: Per product knowledge, “Add the security group, which was provided as parameter to MOMADAdmin.exe to the Operations Manager Administrators User Role.” In the OpsMgr console -> Administration -> Security -> User Roles -> Operations Manager Administrators, added the group specified when using the MomADAdmin.exe program to configure AD Integration. Closed the alert, re-started services on the server where AD agent assignment was failing (no change). Restarted the three services (in SP 1 these were OpsMgr Config Service, OpsMgr Health service, OpsMgr SDK Service, in R2 they are now System Center Data Access, System Center Management, System Center Management Configuration) and the agent appeared in of the OpsMgr console -> Administration -> Device Management -> Pending Management as expected.

Alert: Agent proxying needs to be enabled for a health service to submit discovery data about other computers.

Issue: The agent specified in the alert description does not have agent proxy enabled.

Resolution: Found the name of the system within the alert description field (dc.abcco.com), copied the server name and opened the administration node -> Device Management -> Agent Managed and filtered on the name of the server (pasted in). Right-click on the server, go to the Properties on the Security tab. Check the Allow this agent to act as a proxy and discover managed objects on other computers checkbox. This is an alert rule so it will not auto-close, manually closed the alert on the monitoring section of the OpsMgr console.

Alert: Backward Compatibility Script Error

Issue: MOM Backward Compatibility Service State Monitoring Script on line # 71.

Resolution: This is a bug in WMI. The BlackBerry MDS Connection Service has a very long ImagePath registry entry, when the health service script runs Select DisplayName, State, Name, StartMode, StartName FROM Win32_Service a null is returned for the StartName because the buffer allocated for the results it too small and the call fails, this can be verified using wbemtest. Connect to the root\cimv2 namespace and run the following query:

Select DisplayName, State, Name, StartMode, StartName FROM Win32_Service

In the results, scroll down to BlackBerry MDS Connection Service and double-click on the row to view the details, as you can see in Properties the StartName is null.

The problem is described at http://groups.google.co.uk/group/microsoft.public.win32.programmer.wmi/browse_thread/thread/4cef045b79c1b5cb/1ee2b09a1fa130ab?lnk=st&q=win32_service.startname+is+null&rnum=1&hl=en#1ee2b09a1fa130ab. Tried to obtain the fix mentioned but MS support said that the Bug ID did not exist.

The workaround is to change the path so that it uses the short (8.3) folder names, e.g.

Original Key:

“C:\Program Files\Research In Motion\BlackBerry Enterprise Server\MDS\bin\bmds.exe” -s jvmpath=”C:\Program Files\Java\jre1.5.0_11\bin\client\jvm.dll” -XX:+DisableExplicitGC -Xss64K -Xmx768M -Xms128M classpathdir=”C:\Program Files\Research In Motion\BlackBerry Enterprise Server\MDS\classpath” wrkdir=”C:\Program Files\Research In Motion\BlackBerry Enterprise Server\MDS\Servers\BES1? webserverdir=”C:\Program Files\Research In Motion\BlackBerry Enterprise Server\MDS\webserver” -rbes “BES1_MDS-CS_1?

New Key:

“C:\PROGRA~1\RESEAR~1\BLACKB~1\MDS\bin\bmds.exe” -s jvmpath=”C:\Program Files\Java\jre1.5.0_11\bin\client\jvm.dll” -XX:+DisableExplicitGC -Xss64K -Xmx768M -Xms128M classpathdir=”C:\PROGRA~1\RESEAR~1\BLACKB~1\MDS\CLASSP~1? wrkdir=”C:\PROGRA~1\RESEAR~1\BLACKB~1\MDS\Servers\BES1? webserverdir=”C:\PROGRA~1\RESEAR~1\BLACKB~1\MDS\WEBSER~1? -rbes “BES1_MDS-CS_1?

Restart the service, and rerun the query in WBEMTest, with the shorter path the server now returns the correct username.

It would be preferable if the problem was fixed properly, but the workaround does not seem to cause any adverse effects.

UPDATE: Found this on another system with a different type of service. The start name was null, and the service would not start when attempted to start it. Used the sc delete to remove the service, rebooted the system, and it worked like a champ.

Alert: Check the application’s security policy

Issue: Two management servers were added into an environment where AD Integration was configured. This alert occurred on both systems when RMS’s OpsMgr Health Service was restarted.

Resolution: Gave the same access rights to the new management servers as had been given to the RMS by adding the computer accounts into the MOMADSecurityGroup created as part of the process to configure AD Integration in OpsMgr. Once this was done, it was verified by checking in Active Directory Users and Computers (View, Advanced Features) and validating that in the OperationsManager container under the name of the management group that the additional management servers had records defined for them.

Alert: Connection Timeout

Issue: On a TCP Port monitor, two alerts are generated when the system cannot be communicated with. The first is a Connection Timeout, and the second is a <Servername> Group Roll-Up Monitor. The server in question was being monitored via a TCP Port monitor to provide rudimentary monitoring through monitoring the RDP port (3389).

Resolution: The system in question was offline and needed to be brought back online, so the monitor functioned as expected.

Alert: Failed Agent Push/Repair – Remote Agent Management operation failed

Issue: Failed attempting to push the agent to the system.

Resolution: Logged into the system and manually installed the agent.

Alert: Data Warehouse configuration synchronization process failed to write data

Issue: After importing a large number of management pack files, the data warehouse started reporting issues. The health explorer listed an event number 31552 that the data filed to store in the data warehouse due to a SQLException Timeout expired.

Resolution: On the data warehouse server, used sp_updatestats to update the OperationsManagerDW database per notes in the newsgroups from Vitaly. The alerts were automatically closed after this action was performed.

Alert: Data Warehouse failed to deploy reports for a management pack to SQL Reporting Services Server

Issue: The DNS management pack can cause issues in the environment resulting in event ID 26319 from the OpsMgr SDK Service (System Center Data Access is the new service name in R2).

Resolution: Add the account designated as the Data Reader account to the group designated as Operations Manager Administrators during setup (this group is added to the Operations Manager Administrators role). This issue only exists with the DNS Management Pack (version 6.0.5000.0) and no other management packs.

Alert: Data Warehouse failed to request a list of management packs from SQL RS server

Issue: The data warehouse reporting server was being rebooted.

Resolution: Once the reporting server was back online, this alert auto-resolved itself.

Alert: Data Warehouse managed object type synchronization process failed to write data

Issue: After importing a large number of management pack files, the data warehouse started reporting issues. The health explorer listed an event number 31554 on the workflow Microsoft.SystemCenter.DataWarehouse.Synchronization.TypedManagedEntity.

Resolution: On the data warehouse server, used sp_updatestats to update the OperationsManagerDW database per notes in the newsgroups from Vitaly. The alerts were automatically closed after this action was performed.

Alert: Failed to Check for Password Expiration on RunAs Account

Issue: Operations Manager is unable to monitor Run As accounts for account and password expiration for the server specified.

Resolution: There was an error on the account (Administration -> Security -> Run As Profiles). In this case, the domain name had a typo on it.

Alert: Failed to send notification using server/device

Issue: Issues providing notification via Instant Messaging.

Resolution: The Instant Messaging configuration defaulted to port 5060, but the IM server itself was configured to use port 5061. Tested connectivity from the OpsMgr server to the LCS server with telnet <ServerName> and it did answer on the telnet. Configured a Run As Account for Notification Account for the OpsMgr server using the same account specified in the Notification settings. Tried logging in to LCS using the account configured as the Instant Messaging and sent a test IM message. Does Communicator need to be installed on the OpsMgr box? (Installed to test it, logged into the account that SIP was going to use).

Alert: Failed to send notification

Issue: Notification in OpsMgr was configured for a single SMTP server. When this server was offline, these alerts occurred (logically).

Resolution: Defined additional SMTP servers to provide failover in case of loss of the primary SMTP server system. Used the Alert Forwarding MP to validate connectivity to the connectivity to each SMTP server (discussed at http://cameronfuller.spaces.live.com/blog/cns!A231E4EB0417CB76!1737.entry).

[lb] Alert: Failed to send notification using server/device

Issue: Email was being sent to a remote email environment and communication was lost between the environments.

Resolution: When communication between the environments was restored, notification began to function again. Closed the alert, as it did not recur after communication was re-established.

Alert: Failed to send notification using server/device

Issue: Notification in OpsMgr was configured for a single SMTP server. When this server was offline, these alerts occurred (logically).

Resolution: Defined additional SMTP servers to provide failover in case of loss of primary SMTP server system. Used the Alert Forwarding MP to validate connectivity to the connectivity to each SMTP server (discussed at http://cameronfuller.spaces.live.com/blog/cns!A231E4EB0417CB76!1737.entry).

Alert: Failed to send notification using server/device.

Issue: Blocked on Exchange 2007 http://msexchangeteam.com/archive/2006/12/28/432013.aspx. The box that was being pointed to did not respond on port 25 as the system was a mailbox server, not a client access server. Notification failed later due to security issues from an anonymous connection (the default configuration).

Resolution: Re-configured OpsMgr to use the client access server that did respond on port 25. Configured the notification to use Windows Integrated authentication. Configured a Run As Account and configured the Run As Profile for the Notification Account for the management server to use the account which was created.

Alert: Failed to send notification using server/device

Issue: The RMS lost communication with the various SMTP servers that were defined. Once the network communication was back online, notifications were able to be sent.

Resolution: Lowered the priority of this alert to warning, as there is a critical for the alert “Failed to send notification” which appears to occur when not all SMTP servers can be communicated with.

Alert: Health service heartbeat failure

Issue: The OpsMgr health service on the agent was stopped. Another potential cause is if the OpsMgr health service on the agent was running but unable to communicate with the OpsMgr management server.

Resolution: Restarted the OpsMgr agent with Computer Management through the Actions pane. For the unable to communicate issue, the server was running a security application that restricted network traffic and blocked the network traffic from the server to the OpsMgr management server via port 5723.

Alert: OleDB: Results Error

Issue: Network communication between the RMS and the Operations Manager database was interrupted. The alert rule which generates this critical alert is the OleDbProbe: Results Error”

A good discussion on these types of alerts is available at http://blogs.technet.com/jonathanalmquist/archive/2008/07/29/oledb-results-error.aspx.

Resolution: In this case, once network connectivity was re-established between the RMS and OpsMgr database the alert was no longer relevant and was manually closed. Created an override to disable this alert for the RMS that was reporting these occasionally per the link listed in the issue section of this alert.

Alert: Ops DB Free Space Low

Issue: The Operations Database for OpsMgr 2007 has less than 40% free space available.

Resolution: OperationsManager database was not large enough to provide seven day (default) retention for the number of agents being monitored. Increased the size of the database using the SQL Server Management Studio (install it on the system running the OpsMgr console for ease of use). Connect to the server running the OpsMgr Operational database (shown in the alert), open the server/databases, right-click on the OperationsManager database (default name) and click Properties. Click on the Files tab, change the MOM_DATA size to the new size and click OK. You can validate the change in size occurred by going back to the properties of the database. The alert will resolve itself in Operations Manager in approximately 15 minutes if enough free space is available, as this monitor is defined to a 900-second frequency.

Alert: Recipient address is not valid.

Issue: Recipient address is not valid for notification. Email was sent to remote email environment and communication was lost between the environments.

Resolution: When communication between the environments was restored, notification began to function again. Closed the alert as it did not recur once communication was re-established.

Alert: Root Management Server Unavailable.

Issue: Alert occurring, but the OpsMgr Health Service was running on the RMS server. The alert description said ‘The root management server (HealthService) is running but has reported limited functionality soon after (date/time). The specific reason code is 49 and description is “The health manager has detected that entity state collection has stalled.’ This happened immediately after installing the reporting server into the OpsMgr environment.

Resolution: Restarted the OpsMgr Health Service on RMS system and the alert closed.

Alert: Root Management Server Unavailable

The following alert randomly recurs on an RMS with no related alerts and with no apparent cause:

The root management server (Healthservice) has stopped heartbeating soon after (date and time). This adversely affects all availability calculation for the entire management group.

Resolution: If the alert truly had no discernable root causes, then and the Root Health Service Watcher should be tweaked to allow for a greater variance in the heart-beating interval by adding a DWORD value named MinutesToWaitBeforeAlerting to the following registry key and setting it to 5:

HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft Operations Manager\3.0\SDK Service\RHS Watcher

Restart the Health, SDK, and Config services on the RMS after this change.

Submitted By: Jason Sandys

Alert: RunAs Logon Type Check Failed

Issue: The RunAs account failed to log on interactively. The RunAs account needs to have the Log on interactively right.

Resolution: Gave logon on interactively rights to the user created for the RunAs account, in this case through providing administrator access to the system in question.

Alert: RunAs Successful Logon Check Failed

Issue: Domain controllers for the domain where the SQL RS server existed were offline.

Resolution: Brought back online the domain controllers for the domain, and this alert auto-resolved itself.

Alert: RunAs Successful Logon Check Failed

Issue: One or more RunAs accounts failed to log on. The account may be disabled or has an expired password.

Resolution: Gave logon on interactively rights to the user created for the RunAs account, in this case through providing administrator access to the system in question.

Alert: Script or Executable Failed to run

Issue: Scripts not running on agentless managed system that is a NAS not an actual server. This occurs on both CPU Utilization and Memory Utilization.

Resolution: The only option on this if the NAS was going to be monitored agentless was to disable the alert for the RMS.

Alert: Script or Executable Failed to run

Issue: Lots of Script or Executable Failed to run errors on the same system all failing at the same time (in this case about a half-dozen or more would all fail with a 21402 (timeout).

Resolution: WMI was non-functional on the system (stuck at 100% utilization one processor). Stopped the WMI service, when that failed killed the process and re-started the WMI service.

Alert: Script or Executable Failed to run

Issue: Script failure for Nslookuptest.js. Reporting for tests to Microsoft.com, localhost IP address, and the fully qualified name of the server all three failed at the same date and time.

Resolution: Noted the alert and the date/time to see if a root cause could be tracked back. Reviewed the event logs on the system to track back potential issues/none found. Reviewed the performance counters gathered by OpsMgr, but no bottlenecks identified during that timeframe. Closed the alerts.

Alert: Script or Executable Failed to run

Issue: The process exited with 0 Command executed: C:\Windows\system32\cscript.exe /nologo IsHostingMSCS.vbs.

Resolution: Occurred immediately after deploying the agent onto a new server. From the newsgroups this can occur when discovery has not yet finished (written by Rob Kuehfus). Closed the alert to see if it would recur on this system.

Alert: Script or Executable failed to start

Issue: Paging file is too small.

Resolution: Needed to add memory to the system.

Leave a Reply