We have 2 SCOM Management Server , one of this, since 3 days ago, stucks in grey state, if we Flush Health Service State and Cache on this Management Server, it comes back to green status but in 15-20 minutes stucks on grey again.
Analyzing the causes, we see that since 3 days ago, the number of StateChange that usually is from 100 to 3000 changes per day, the last 3 days as an average of 35.000 changes!! If we query database to identify what monitor is the root cause, we don’t found any suspect monitor, the top 50 noise monitors are in normal values and are the usually common monitors that generates noise (ping monitor, etc)
We are very worried because this floods don’t stop and the DWH not stop growing, also we have high disk latencies on the LUN that uses the SCOM SQL Database
¿any idea how to resolve this?
I attach the top 50 noises monitor but seems ok, the first is Microsot.SystemCenter Ping with 3328 changes (normal in a big environment)
I would suggest that you open a case with Microsoft support immediately and not wait for an answer here. I had a similar problem last year where constant state changes were causing my data warehouse to grow very quickly. The problem is that once you fix the problem you are left with a data warehouse that is HUGE and you will have no way of reclaiming that space once the data is purged. Microsoft does not recommend you do a DB Shrink of the DW. I even tried doing a backup of the DW and then restoring from backup but the whitespace was still there so I am now stuck with a DW that is 1.5 TB’s in size but over half of it is just empty white-space which is a waste of storage.
After lot of hours investigating logs and eventviewer lines, I was capable to resolve the problem, the problem was initiated by some old agents (windows 2003) was causing flood on DWH, root cause was old versions of Windows 2003 with agent, and another agentless (w2k).
Microsoft support corfirms my solution after I resolve by my researching.