In the previous series I showed how to create the the discoveries for the different instances that make up a NetApp Filer.
This part will cover the parts which are still missing if we look at the properties of the classes. We are going to “discover” these properties which in return will make it possible to collect and monitor performance of the NetApp Filer.
As I stated in the first post I wanted to be able to monitor performance and not just alert on the performance but calculate health status based on the collected information. The big improvement instead of telling the device to send a trap if it is running out of disk space is that the health status will warn if there are problems and will return to healthy if resolved. Besides this it will also make it possible to do forecast and trend analyses using reporting and the performance data collected.
Discovered Properties by Now
By using the previously created datasources and discoveries we are now able to discover the following (picture right) classes and their properties.
With the lun discovery we didn’t go through the process of creating another datasource to discover the Index property. This is because we can reuse the same datasource as we used by discovering the Aggregate class property Index. Just create another discovery rule and use datasource NetApp.Management.Pack.Datasource.Discovery.ObjectIndex and target it to a NetApp Lun to discover the Lun Property Index!
There is again one of the benefits of creating your own datasources. You create one datasource and reuse it in your discoveries or monitors!
Below are the properties we are still missing to be able to discover these we are going to create a couple of more data sources.
We are going to use the same principles we used in our earlier discoveries and some more. These will give you a more detailed view on how the datasource principle is so beneficial if you are designing your own management packs.
Because of the design of the NetApp MIB we need to discover again an index value.
The reason for this is we are running our initial discovery at a different Index then we are going to use when collecting performance data.
This index contains all counters for the disk space performance collection. These are the ones we are interested in now how are we going to discover these and make sure we are collecting information for the effected file system?
Now when browsing the MIB deeper (picture right) under dfTable we notice an OID called dfFileSys This OID contains the name of the referenced file system which we are going to use!
We have now managed to create a couple of datasources which in their turn use different approaches for data collection. Combining the knowledge we should be able to come up with a solution!
The solution for collecting performance data contains 2 parts the first part is discovering the dfIndex OID. The second part is using this dfIndex OID to target the correct OID’s for collecting performance data.
First part dfIndex oid discovery which is the FSIndex property for aggregate and volume classes.
Let’s start with creating the datasource give it a name NetApp.Management.Pack.DataSource.Discovery.FSIndex
And use the following modules:
The datasource is based on previously created datasources it’s a combination.
The Scheduler and Mapper Modules are the same as in the datasource NetApp.Management.Pack.Datasource.Discovery.ObjectIndex.
The SNMPProbe is basically the same as NetApp.Management.Pack.Class.NetApp.Aggregate
Now let’s start in a chronological way of how the datasource runs.
Notice the oid we are going to run a SNMP walk on is the dfFileSys which is the filesystem name.
The value filter is used to filter the results. Like we did with the datasource NetApp.Management.Pack.DataSource.Discovery.Lun
After making sure we have the value oid for the corresponding volume or aggregate the data is passed to the script otherwise the data is dropped.
After making sure the correct data is collected by use of the filter the OID is passed to the scriptprobe.Which will retrieve the last number of the OID which is the index number of the corresponding aggregate or volume.
Notice there are actually two components discovered FSIndex which is the indexnumber of the filesystem and SNIndex which is the index number of the snaphots hosted on the effected volume or aggregate.
When browsing the Mib and retrieve the data for dfFileSys which returns the following value’s:
Notice there are actually 2 filesystem value’s for each volume or aggregate. The reason is the first one is always the filesystem itself and the second one is always the snapshot usage.
By retrieving the first index FSIndex we are able tell the snapshot index as well!
If the first FSIndex is 5 then the snapshot index of this volume will be 6 and so on. Although I haven’t used the SNIndex yet in the management pack. I have plans to add the snapshots as well for performance monitoring.
Remember the approach we took when designing the datasource NetApp.Management.Pack.Datasource.Discovery.ObjectIndex. This mapper is exactly the same so when designing the mapper use the same approach, save as default and edit afterwards in xml editor.
Finally for this datasource the configuration tab
Now we can use this datasource for discovering the FSIndex and SNIndex of Aggregates and volumes! Simply create a discovery rule target it at the aggregate or volume class and use the newly created datasource and fill in the missing pieces!(right picture)
Now we have created the datasource and discovery let’s wrap up.
Let’s say we want to discover the FSindex of an aggregate named Aggregate1.
The discovery runs on aggregate1 which does a walk on the filesystemindex dfFileSys.
All collected values from the walk are passed through the filter which filters the value on aggregate1. Only the Value for Aggregate1 and it’s OID is returned.
Now the OID is passed to the script which trims it to only the index number and adds the index number in a property bag FSIndex. For SNIndex the FSIndex is increased by one.
Both values are now mapped to the Aggregate1 class properties FSIndex and SNIndex!
Now we are able to discover the different classes and their main properties.
Although there are some more discoveries which we didn’t cover but with the information provided in the series you should be able to figure them out yourself 😉
Now with all the properties discovered we can start with the next part which are the performance counters. Although you are free to design your own I have narrowed it down to the main counters.
Basically we going to create the following performance counters
Creating these counters we are going to use 2 datasources
The first one NetApp.Management.Pack.DataSource.Performance.Percentage.Used is the easiest one it consists of tree modules a schedule a filter and a SNMPProbe.
The SnmpProbe sends a SNMP Get to the OID named dfPercentKBytesCapacity OR .18.104.22.168.4.1.722.214.171.124.1.6.$Config/Index$ where $config/Index$ is the FSindex property of the Aggregate or Volume Class.
That’s all to it!
In more detail we needed to discover the FSIndex for the aggregate classes and volume classes to be able to pass this index along with the correct OID to retrieve any value in the dfindextable which is shown on the right side. Notice all the OIDS shown in the index and their corresponding MIB name.
When using the FSIndex property we can send a snmp get to any OID listed in this index! And for percentage used this is dfPercentKBytesCapacity….. Although the name is not really clear 😉
This approach we are also going to use in our next datasources.
Although creating the above datasource is optional because you could as well be using the one from the system.performance.library MP named System.Performance.SNMPPerformanceProvider!
There is one difference since I created the datasource myself I have the option to declare which $Config/ values I want to be able to set overrides on!
Default you can only enable or disable the one from System.Performance.SNMPPerformanceProvider.
And I wanted the most flexibility in the MP I created the datasource to be able to override the value for the interval.This way when sealing the MP I don’t have to worry about flexibility!
The next datasource we are going to design is NetApp.Management.Pack.Datasource.Performance.KBytesToGB which is the datasource for collecting the value and translating it to GB values since with today’s disk space consumption it is easier to read GB instead of KBytes 😉
If you want to use KBytes you are free to do so and don’t need this datasource in fact you can rely on the good old System.Performance.SNMPPerformanceProvider again but no overrides on the interval!
First I needed to find the values for UsedBytes this was a real journey! Why?
If you look they are right there dfKBytesUsed so just use the same approach as with the percentage used one would say. This works great as long as your values are beneath the 2 TB there is no problem with using this approach.
But when you go above the magic number of 2 TB (2147483647) something magical happens. The value of the OID changes to a negative value!
Instead of sending 2.1 TB it will show –2 TB (or something) and using the calculator in any possible way I know (and don’t know) I couldn’t come up with the correct value when collecting this OID.
This called for further investigation!
The reason for this magic number….. is not so magic when explained. These SNMP OID values are 32 bit Integer values and they can hold only SNMP Value up until 2147483647 (2TB) if it gets above this it will go into a negative value.
Well that explained the problem but didn’t resolve it. Looking further at the index the are actually 64 bits OID’s. Great! These are the counters we are looking for.
There are also counters for High and Low, since we are using SNMPv2c which supports 64bits we have no need for them but just for the record. Back in the (old) days when SNMPv1 was/is used and they discovered the limitations of 32 bits they did some math to come up with a clever solution of writing the data to 2 OID values High and Low. You would need to do the math in a script to come up with the correct value like below:
if (Low >= 0) x = High * 2^32 + Low
if (Low < 0) x = (High + 1) * 2^32 + Low
Back to the datasource we know the OIDs we are interested in and we have the FSIndex number so we are able to do our trick again.
Scheduler and Filter are familiar and take a look at the SNMPProbe.
Notice the OID values are actually two parameters combined in one. The reason for this is we can reuse the datasource in the following performance collection rules
Used – Total and Available!
A simple script to calculate from KBytes to GB and pass it to a property bag called SizeGB.
Now just create your custom Performance Collection rules and you are able to collect performance on your Aggregates and Volumes on disk usage. For the full information here are the basic steps:
Create a custom Rule and target it at the Aggregate or Volume Class create the following modules on the modules tab:
A Module for writing data to the Operations Database (PerfWriteOpDB) and one for writing data to the Reporting Database (PerfWriteRepDB)
Now we have created the performance collection rules I also wanted to create a Monitor type and explain how to. This way you can re-use this approach in your own management pack journeys 😉
Before I start I have also created a Datasource to do a basic SNMPprobe called NetApp.Management.Pack.Datasource.Basic.SNMPProbe it is a basic datasource of a schedule, filter and SNMP probe module. For details see the Management pack itself.
Create A new Monitor Type and call it NetApp.Management.Pack.MonitorType.PercentageUsedSpace
Make it a 3 state monitor since we want to monitor the percentage used till a warning state is raised and an error state is raised.
Create the 3 IDs for the three different states.
Member Modules Tab
Here we are telling which datasource to use to collect the value and what condition the data needs to have.
UtilizationOK UtilizationLow UtilizationHigh
Regular Tab – Configuration Schema – Overridable Parameters
|Combine condition detection and status if the monitor.
||Fill in the parameters to be use in the monitor.
|Fill in the overridable parameters.
We have created our monitor type. And as it is the same with creating datasources you can change or configure them as you like.
Now just create a custom monitor target it at the class you want (aggregate or volume) and select this monitor type as it’s base and fill in the missing information.
We have covered the main areas in designing this Management Pack and I hope you found the information useful and will use the approaches explained in your own MP’s.
As promised I will upload the Management Pack NetApp Management Pack for SCOM 2007 R2 to the site for everyone to use and have fun with it!
Be aware though that using the MP is at your own risk because of the way SCOM and SNMP work the workload on your SCOM server can increase as well as on the filer you want to monitor.
I have tested the MP in different areas and as soon as the filers are getting bigger the more probes are being send and the heavier the workload is going to be on both sides! Although the MP is more tuned then described in this series always be careful when importing management packs in you environments especially when it is production. If available always use a test environment to be sure you understand the management pack and it’s impact! Closely monitor your environment before, during and after to be sure no problems arise.
Especially if you want to play around first try the NetApp Simulator like I did when developing this management pack.
I have created this Management Pack for fun and brains! To play around with SNMP and SCOM and to learn. I hope you do as well 😉
The release of Kris his XSNMP management pack has brought a new approach for dividing the workload on SNMP monitoring! Great Job! Will look into it to it as soon as I have time to tune this MP based on the approach used. Also will be working on a SCOM 2007 SP1 version as soon as I have time probably within the next weeks.