Operations Manager (OpsMgr/SCOM) Heartbeat Failures

A question that comes up time and time again is ‘what causes an agent to have heartbeat failures?’

This could be caused by a number of things. Questions to ask yourself in troubleshooting this include:

– Is the communication between the agent (server) and the SCOM Management server good? (Is the agent on in the same office/network or at a remote location?)

– Is the agent on a DC? If so, is the OOMADs.msi (Active Directory Object Helper file installed?)

– What version of SCOM are you running? If it’s not R2, then putting a server into Maintenance Mode will not stop the heartbeat failures as those come from the Health Service Watcher class and run on the Management Server not the agent.

– What is the heartbeat interval? 60 seconds is the default. Has that been changed?

– Is it only the one server that gets the heartbeat errors?

Typical causes of heartbeat failures include:

– Bad network connectivity between agent and management server

– Agent issues (try stopping the healthservice on the agent, deleting the health service state, and restarting the healthservice) ‘flushing the agent cache’

– Patches for SCOM might be different between agent and management server. Look for the Patch list property in the Health service state view on both the mgmt server and the agent to confirm that they are both at the same patch level

– If on a DC, confirm that the Active Directory Object Helper file has been installed. If not, install it.

Have fun, learn System Center! For great technical training or consulting guidance on System Center technologies, look to www.infrontconsulting.com.

Rory McCaw, OpsMgr MVP, Principal Consultant, Infront Consulting Group

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.