Requirements for a SCOM 2007 Health Check Script?
Home  » Requirements for a SCOM 2007 Health Check Script?

Requirements for a SCOM 2007 Health Check Script?
Posted: Sun, Jun 21, 2009 10:54 AM :: Rank: 71
Author
Points: 65622
Level: System Center Expert

Simon and I were discussing the need for a freely availble SCOM Health Check Script to make troubleshooting end user issues an easier task. Certainly this could be accomplished through Powershell with enough effort, and some pieces of such a script already exist in the public domain.

In order to do this, we would need to establish the requirements that would have to be addressed by the script. And it need not strictly be a health check script, but could be more of a data collection tool that would allow one to assess health where programmatic diagnosis may not be practical

SCOM Health Check Script Requirements:

  • Check proper database sizing based on monitored server / device count
  • Check SQL configuration against supported configurations / best practices 
  • Look for database latency - (event 2115) in the RMS / MS OpsMgr Event Log 
  • Collect agent count reporting to each MS & RMS
  • Retrieve recent warning and critical events from RMS OpsMgr Event Log 
  • What else????

Please add your thoughts to the list!

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Sun, Jun 21, 2009 12:11 PM :: Rank: 51
Author
Points: 950
Level: System Center Hero

* locate "grey" hosts

* verify fixes for OpsMgr / Agents

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Sun, Jun 21, 2009 3:24 PM :: Rank: 56
Author
Points: 1183
Level: System Center Specialist

An "OpsMgr 2007 Health Check" is already the name of a service offered to Microsoft Enterprise Customer who have a Premier support contract, and it is delivered by accredited Microsoft Premier Field Engineers worldwide. As you are suggesting, we developed it by using powershell to gather the data. I have been ranting and mentioned this on my blog already.

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Sun, Jun 21, 2009 11:38 PM :: Rank: 38
Author
Points: 65622
Level: System Center Expert

 Daniele, The MS-delivered OpsMgr Health Check is a great service for Premier customers I have no doubt. I think we were envisioning something likely a bit less complex that would be available to the other customers (often in smaller environments) as an ad-hoc tool for assessing health. 

This would be great for community support scenarios and for customers outside the Premier support plans without budget for pro services engagements for assessing the health of the SCOM deployment.

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 22, 2009 5:53 AM :: Rank: 68
Author
Points: 1183
Level: System Center Specialist
True, I suppose I felt sorry I cannot share internals of what we do in the MS "health check" more than a certain extent... but, as you mentioned, a lot of the information is already out there in blogs... :-)
   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 22, 2009 7:23 AM :: Rank: 89
Author
Points: 42748
Level: System Center Expert

 An OpsMgr Health Check Script should perhaps also include a check to find 1) any targeting mistakes (groups targeted instead of classes) and 2) If possible, any object discoveries created with a dangerously low  interval.  

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 22, 2009 10:42 AM :: Rank: 89
Author
Points: 27734
Level: System Center Expert
Query for all scripts, collect them, run them, catch the errors. The eventlog doesn't provide you with too much information regarding scripterrors.
   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 22, 2009 11:04 AM :: Rank: 86
Author
Points: 6866
Level: System Center Specialist
- Could check that the MP's are getting regularly updated from DB to RMS and from RMS to Management Servers..
   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 22, 2009 1:01 PM :: Rank: 68
Author
Points: 40804
Level: System Center Expert
My view is that people on this site are in the process or have in fact already purchased a System Center product. It is the hope that communities like this one remove as many obstacles as possible to ensure the System Center experience is focused on its primary function. We welcome as much talent sharing as each individual wants to offer knowing there is no reward (other than a warm glow). The input already by MS personnel has been great and welcomed and I hope will continue. Everybody has something to offer.
   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 22, 2009 7:06 PM :: Rank: 44
Author
Points: 40804
Level: System Center Expert

I have been drawing more thoughts to the Health Check tool and have the following to add;

• We can check for a failed backup as well as a successful backup but we never monitor for the fact a backup is even being attempted.

• We also check the fact the Transaction logs are getting close to be full

• We could check that the SQL Server is setup correctly like Auto Grow is False etc

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Tue, Jun 23, 2009 9:17 AM :: Rank: 12
Author
Points: 151
Level: System Center Hero
could check if the deployed MP's are up to date
   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Wed, Jun 24, 2009 10:02 PM :: Rank: 13
Author
Points: 6007
Level: System Center Specialist

 An OpsMgr Health Check Script should perhaps also include a check to find 1) any targeting mistakes (groups targeted instead of classes) and 2) If possible, any object discoveries created with a dangerously low  interval. 

The second suggestion would really help.  Again today, I found yet another 3rd party MP running discoveries every 5 minutes on every server we manage.  It would actually be a great monitor or something.

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Wed, Jun 24, 2009 10:25 PM :: Rank: 38
Author
Points: 65622
Level: System Center Expert

 Funny you should mention these. I am actually sitting here right now working on the Powershell to loop through and find rules targeting singleton classes, which I think will catch the bulk of those. 2) is also a big one for me too - I was thinking for in-house MPs, as the MPBPA does not catch those. I suppose it's equally viable for 3rd party MPs as well. 

Nice suggestions on the v10 Wish List too. 

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Fri, Jun 26, 2009 5:37 AM :: Rank: 59
Author
Points: 1183
Level: System Center Specialist

for 2), Daniele Grandini's blog lists a few good queries for this

http://nocentdocent.wordpress.com/2009/05/23/how-to-get-noisy-discovery-rules/

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Fri, Jun 26, 2009 11:13 AM :: Rank: 86
Author
Points: 295
Level: System Center Hero
   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Fri, Jun 26, 2009 12:55 PM :: Rank: 114
Author
Points: 295
Level: System Center Hero
We have had a similar need in our environment so I created a "OpsMgr DB Health Report" that we get daily.  It is a collection of queries that I wrote or grabbed from other sites. 
 
Below are some of the items that we check, these are not in Pete's orignal list but each one of these was below have been used in troubleshooting issues in production environments.
 
Check Ops DB Grooming History
Remember all of the Grooming fun with MOM 2005?   OpsMgr is much cleaner and essentially does a Truncate to remove old data instead of having to transfer it to the Warehouse. If grooming isn’t ran it can cause too much data to be stored in the OpsDB and affect performance. There are two easy ways to spot this:
1. Query the internal job history table, you should see a successful job for each day, note this is in UTC.
SELECT     InternalJobHistoryId, TimeStarted, TimeFinished, StatusCode, Command, Comment
FROM         InternalJobHistory
WHERE     (DATEDIFF(day, TimeStarted, GETDATE()) < 7) AND (NOT (Command = N'Exec dbo.p_DataPurging'))
ORDER BY TimeStarted DESC
 
2. From the PartitionTables check the Partition Start and End columns and make sure they are current spanning one day.
select top 10 * from partitiontables Order by partitionendtime desc
 
For more info on grooming check out a couple of Steve Rachui's blogs:
 
Large Table Query
 
This is useful for spotting a myriad of issues including localized text problems (still can happen in R2, see Steve’s post ), grooming problems and Perf/Event storms. I recommend creating a report from this query and saving it daily for comparison.
 
SELECT top 15 so.name, si.rowcnt as row_count,
8 * Sum(CASE WHEN si.indid IN (0, 1) THEN si.reserved END) AS data_kb, Coalesce(8 * Sum(CASE WHEN si.indid NOT IN (0, 1, 255) THEN si.reserved END), 0) AS index_kb, Coalesce(8 * Sum(CASE WHEN si.indid IN (255) THEN si.reserved END), 0) AS blob_kb FROM dbo.sysobjects AS so JOIN dbo.sysindexes AS si ON (si.id = so.id)
WHERE 'U' = so.type GROUP BY so.name, si.rowcnt 
ORDER BY data_kb DESC
 
Grey Agents Report
 
Agents that are Grey in the console:
 
SELECT     ManagedEntityGenericView.DisplayName, ManagedEntityGenericView.AvailabilityLastModified
FROM         ManagedEntityGenericView
INNER JOIN  ManagedTypeView ON ManagedEntityGenericView.MonitoringClassId = ManagedTypeView.Id
WHERE     (ManagedTypeView.Name = 'microsoft.systemCenter.agent') AND (ManagedEntityGenericView.IsAvailable = 0)
ORDER BY ManagedEntityGenericView.DisplayName
 
Check for Overrides in Default MP
 
select aov.name, parenttype, overrideableparametername, value, overridetype, aov.lastmodified from AllOverrideView aov
inner join ManagementPackView mpv on aov.managementpackID = mpv.Id
where mpv.name = 'Microsoft.SystemCenter.OperationsManager.DefaultUser'
order by aov.lastmodified DESC
 
SQL broker Enabled
The most obvious sign of SQL Broker not being enabled is Discovery failing.  In R2 it will actually display a warning message during discovery but the fix often requires stopping the services on the RMS and taking the Database into SIngle user mode to  enable SQL Broker.  It would be nice to get a critical alert when this occurs, but it can be spotted easily enough by running using this query: 

SELECT is_broker_enabled FROM sys.databases WHERE name = 'OperationsManager'

 I also have and some queries I developed that take the typical Perf and Event count reports and adds by management pack, but I think that is out of scope for this discussion.  I will post some more in the near future and hopefully get some time to create a supplemental report set.

 

   Report Abuse
RE: Requirements for a SCOM 2007 Health Check Script?
Posted: Mon, Jun 29, 2009 10:03 PM :: Rank: 59
Author
Points: 65622
Level: System Center Expert

 Here is some T-SQL for identifying Rules and Unit Monitors targeted to a  GROUP in error. This doesn't get us to an absolute list of culprits, but does return a very small result set (including name, target and MP) very quickly in which we can easily identify mistakes. You will find a couple of singleton classes which are not groups in some internal MPs.

If we were to query a level deeper to identify base class, we could be 100% accurate (assuming we correctly identified all base classes for a group).

I tried to identify the same information with Powershell, but found it took much longer to execute, thus the move to T-SQL to minimize impact to MG resources. (PoSh took a couple minutes, T-SQL took only a couple seconds).

For Unit Monitors

SELECT TOP (100) PERCENT dbo.ManagedTypeView.DisplayName AS Target, dbo.MonitorView.DisplayName AS MonitorName, 
dbo.ManagementPackView.DisplayName AS MP, dbo.ManagedTypeView.Singleton
FROM dbo.MonitorView INNER JOIN
dbo.ManagementPackView ON dbo.MonitorView.ManagementPackId = dbo.ManagementPackView.Id INNER JOIN
dbo.ManagedTypeView ON dbo.MonitorView.TargetMonitoringClassId = dbo.ManagedTypeView.Id
WHERE (dbo.ManagedTypeView.Singleton = 1) and dbo.MonitorView.IsUnitMonitor = 1
ORDER BY Target

 For Rules

SELECT TOP (100) PERCENT  dbo.ManagedTypeView.DisplayName AS Target, dbo.RuleView.DisplayName AS RuleName,
dbo.ManagementPackView.DisplayName AS MP, dbo.ManagedTypeView.Singleton
FROM dbo.RuleView INNER JOIN
dbo.ManagementPackView ON dbo.RuleView.ManagementPackId = dbo.ManagementPackView.Id INNER JOIN
dbo.ManagedTypeView ON dbo.RuleView.TargetMonitoringClassId = dbo.ManagedTypeView.Id
WHERE(dbo.ManagedTypeView.Singleton = 1)
ORDER BY Target

Let me know if you have another path to this info.

 

   Report Abuse
RE: Requirements for a SCOM 2007 Health Check Script?
Posted: Thu, Jul 09, 2009 3:38 AM :: Rank: 99
Author
Points: 65622
Level: System Center Expert

OpsMgr Health Check Requirements Summary and Next Steps

Here's a categorized summary of your suggestions mentioned as important elements of an OpsMgr Health Check. Many of you already a quite a few scripts, queries (or links to them) in your posts. I've left the pointers out, as we have some additional round of discussion on the "what to check" before we talk about the "how to check".

I've put our names by your suggestions so we can ask one another for clarification or further justification if needed.

Next Steps

As a next step, I would suggest we;

  • Take a look at the list below and identify any critical checks we think are missing.
  • Ask the owner for clarirfication / justification if necessary 
  • Throw out ideas on how this should be delivered (Powershell script, Management Pack, etc)

Result should be a refined list for which we can begin to review data collection options and discuss how to best deliver a report in an easily consumable format. Powershell seems a likely choice, but don't want to make any assumptions at this stage.

Respond to this thread with your additional thoughts, questions and suggestions

Categorized List of Your Suggestions

1. RMS / MS / Mgmt Group Health, Configuration and Connectivity

1.       Look for database latency - High count of event 2115 in the RMS / MS OpsMgr Event Log (Pete)

2.       Collect agent count reporting to each MS & RMS (Pete)

3.       Retrieve recent warning and critical events from RMS OpsMgr Event Log (Pete)

4.       Verify patch levels on RMS / MS (Ziemek)

5.       Check that the MP's are getting regularly updated from DB to RMS and from RMS to Management Servers.. (Sameer)

6.     Check if deployed MPs are up to date (Holger)

7.       Check for overrides in the Default MP (Matthew)

2. Management Pack Configuration and Versioning

1.       Check for targeting mistakes (rules and monitors  targeted to groups)  (Tommy)

2.       Check for object discoveries created with dangerously low intervals (Tommy)

3.     Query for all scripts, collect them, run them, catch the errors. The eventlog doesn't provide you with too much information regarding scripterrors. (Tenchuu)

3. SQL Database Health, Configuration, Maintenance  and Grooming

Configuration

1.       Check proper database sizing based on monitored server / device count (Pete)

2.       Check SQL configuration against supported configurations / best practices (Pete)

3.       Verify SQL Broker is enabled (Matthew)

4.     Check that the SQL Server is setup correctly like Auto Grow is False etc(Simon)

Operational Health

       1.   Check for a failed backup as well as a successful backup but we never monitor for the fact a backup is even being attempted. (Simon)

2.       Check the fact the Transaction logs are getting close to be full (Simon)

3.       Check Operational DB grooming history (Matthew)

4.       Large Table Query (Matthew)

4. Agent Health and Configuration

         1.   locate "grey" hosts (Ziemek, Matthew)

         2.   Verify patch levels on RMS / MS (Ziemek)

 

 

 

 

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Fri, Jul 10, 2009 9:37 AM :: Rank: 63
Author
Points: 40804
Level: System Center Expert
This is to much effort not to move forward with production so lets get going on this, I ready to take on my bit and whatever else. So here's the role call, Pete and I will put the bulk of this together would anyone else like to help?
   Report Abuse
RE: Requirements for a SCOM 2007 Health Check Script?
Posted: Fri, Jul 10, 2009 10:00 AM :: Rank: 66
Author
Points: 482
Level: System Center Hero

can help you guys out for sure. let me know.

   Report Abuse
Re: Requirements for a SCOM 2007 Health Check Script?
Posted: Fri, Jul 10, 2009 11:20 AM :: Rank: 64
Author
Points: 6866
Level: System Center Specialist
sure. .I am in.. Let me know how I can help on this..
   Report Abuse

Home  » Requirements for a SCOM 2007 Health Check Script?
34 Items Page [1]  2      
Top Contributors
Featured Members
Pete Zerger
Points: 65622
Level: System Center Expert
Tommy Gunn
Points: 42748
Level: System Center Expert
Simon Skinner
Points: 40804
Level: System Center Expert
Stefan Koell
Points: 28999
Level: System Center Expert
Andreas Zuckerhut
Points: 27734
Level: System Center Expert