We have a Citrix farm with around 100 servers that are all monitored by SCOM 2012. There is always around 75 servers up and running at any one time. The remaining 25 servers remain down until required. If we have a problem with any of the live session servers these are taken out of the loop and one of the spare 25 are brought online to cover.
The problem I have is when these 25 servers are offline SCOM shouts about it (as it should if any server is offline).
Does anyone have a suggestion on how I can monitor all Session servers without having 25 Hearbeat failure alerts clogging up my Alerts view?
Maintenance mode won’t work as I don’t know if and when these servers will be brought back online so can’t specify an end time.
The only other thing I can think of is to remove the agent from these servers and reinstall when they are brought online.
Create a script on your citrix servers that inserts a custom event ID in the event logs when a server is shut down. Then have SCOM monitor the logs for that eventID and have SCOM kick off maintenance mode on that server via a Powershell channel. Set the maintenance period for a year. Then you could have another startup event that SCOM can track and have SCOM stop maintenance mode when it sees that as well.