Pesky UNIX/Linux SCOM Agents (Gray State) – RETURN CODE: 1

This is a post I was meant to post quite some time ago, but forgot. Nevertheless…

If you have administrated a SCOM environment with both Wintel and UNIX/Linux machines, I am betting you have experienced some gray agents, specifically for UNIX/Linux machines.

The issue was, the server was definitely online, however according the SCOM, the server was offline or at least in a gray state. Below are the steps below I took resolve the gray agent for the machine, the server was Red Hat (RHEL) 6.x.


Steps to diagnose the issue:

  1. Could I ping the server from any of the SCOM management servers? Yes.
  2. Could I ping the server from its resource pool? Yes.
  3. Was there communication between ports 22 and 1270? Yes.
  4. Was I able to establish a Putty session via port 22? Yes.
  5. Was the SCOM process running on the server? Hmm, that’s a funny error…

1


Next are the steps I took to resolve the issue:

  1. Restart SCOM process, “sxcadmin” … Cannot, “RETURN CODE: 1”
  2. Googling, many members in the community have also had this error, and have isolated the issue to a corrupted CIM.Socket and SCX-CMID.PID file(s).
  3. Delete the files:

2

4. Let’s restart the SCX Agent…

3

5. Well that did not work either, check to see if port 1270 is evening listening…

4

6. Okay, let’s kill all processes associated scxadmin service…

5

7. Now let’s start the scxadmin process, and check again to see if port 1270 is listening…

6

8. Perfect! And what does SCOM say?

7

Problem solved! There are ways to automate this process, and this was achieved with the use of SCORCH and Runbooks. I will post that solution soon. Stay tuned.

Happy SCOM’ing! =)

Worth mentioning, I found this post with respect to the CIM.Socket repair for the, “RETURN CODE: 1” at this link HERE. Thanks, phakesley.

 

Leave a Reply