BlastFromThePast: Debugging the Script or Executable Failed to Run Alert

This blog post was resurrected from cache from my original T2R2 blog (cameronfuller.spaces.live.com) as it still has value but is no longer available on the Internet outside of searching through older cache sites.]

When trying to tune OpsMgr 2007 in several environments the most common alert found was “Script or Executable Failed to run”. This alert a warning level and was extremely frequent (actually it was number one on the Most Common Alerts report listed under Reporting / Microsoft Generic Report Library / Most Common Alerts). The difficulty with this alert is that it is caused by any script which fails to run anywhere in the OpsMgr environment. The only way to tell which script is failing (or why) is to review each alert and read through the description field. For my environments these were occurring in a variety of scripts including:

DellStorageDiscovery, DellServerBMCDiscovery, DiscoverVirtualServerType, Freespace, DiscoverMicrosoftExchangeServerRole2003, LogicalDiskHealthCheck, Collect_Public_Folder_Statistics, DellServerProcessorUnit, DiscoverWindows2003NetworkAdapters, DiscoverWindows2003MountPoints, Collect_Mailbox_Statistics, IsHostingMCS, Disk_Space_Problem, DellServicesDiscovery, DellServerNumericVoltUnitMonitor, and SMS2003SiteHierarchyDiscovery.

I did not want to disable this alert because they it does provide notification of errors which will cause problems within OpsMgr (OpsMgr is expecting the scripts to run correctly). I am currently still in process of debugging the scripts that are failing and why, but I put together a management pack which does the following:

Creates a new version of the rule for the “Script or Executable Failed to Run” alert, however it is configured to be an informational alert. The updated version of this rule has two custom parameters, the first of which is the number of the event, and the second is the alert description. The management pack contains an alert view and two event views to assist with debugging.

These alerts are presented within a view in the Monitoring / Failed Batch Responses – Debugging / Alert on Failed Batch Responses. This view shows the source, time, event number and description for any alerts generated by the new rule during the last hour. There are also two event views which provide Failed Batch Responses (events 21402, 21404, 21405, 21406, 21407, 21409) and Other Batch Responses (events 21400, 21401, 21403, 21408, 21410, 21411). There are for events of these types that occurred within the last hour.

To use this management pack, it needs to be imported into OpsMgr. During the debugging I deactivated the original rule which was creating the alerts “Alert on Failed Batch Responses”. Once the debugging is done and these alerts are only occurring occasionally the original alert can be reactivated and the debugging management pack can be removed.

For reference the following is what I have found related to what these event numbers correlate to:
21402 – Script ran longer than the timeout period21403 – Health Service requested the workflow to stop
21404 – Unknown
21405 – Unknown
21406 – Errors found in Output
21407 – Unknown
21409 – Unknown
21400, 21401, 21403, 21408, 21410, 21411 – Each of these events if they occur do not alert as an error.

This Management Pack does not fix the script errors but it is designed to help OpsMgr Administrators to dig in on these alerts to identify trends (such as in my environment most of the errors are 21402’s) and track down the issues. The management pack is available for download at: http://www.systemcentercentral.com/PackCatalog/PackCatalogDetails/tabid/145/IndexID/54476/Default.aspx.

IMPORTANT NOTE: This management pack was built in an environment with hotfix 939799 which resolves the issue with agents that are stopped and do not restart correctly. This hotfix is required to import the management pack due to a dependency on Microsoft.SystemCenter.Library Version 6.0.5000.20. Hotfixes were available for request from a website which was linked to from SystemCenterCentral but this link appears to have been pulled by Microsoft. If you need this functionality and have environments without the hotfix please post to the blog to let me know as it may be viable to build this without the 6.0.500.20 dependency.

This management pack is designed to assist with debugging this alert but is not supported in any way by either myself or SystemCenterCentral.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.