Monday, June 09, 2008

Monitor Dell Servers Running Windows Server Using Nagios and SNMP

Things I use in this entry:

Nagios 2.9
perl:net-snmp
Dell OpenManage Server Administrator
Nagios plugin check_omsa_snmp.pl
Windows Server 2003

At present, the servers that I am responsible for are monitored from a 30,000' view, its getting progressively closer to the ground and services or more precisely groups of services are being monitored. For instance, there are several services that our exchange server depends on but I don't want to and don't think it is necessary to report on each one directly, a better approach IMO is to use nc_net on each windows server to be monitored then have one service to view these as a single point of failure.

I had been asked to look at monitoring the physical hardware of each server and the only real way to do this would be using SNMP and the Dell OpenManage Server Administrator tools.

Installation Instructions

1. Windows Server has SNMP available as an additional built-in component, it can be installed using Add/Remove Windows Components under Management and Monitoring tools.
Feel free to install all of these as most of them are useful but for our purposes, only the installation of the Simple Network Management Protocol component is required.

2. Install the Dell OpenManage Server Administrator software found here

3. Open up services.msc and get the properties for the SNMP agent, click on the security tab, then untick the send authentication trap.
In the upper section, click on add an SNMP community (as read-only), remember this name as you will need it later.
In the lower section add the address of your Nagios as one of the accepted hosts. Make sure to leave in localhost, or if you aren't paranoid (AND WHY NOT????) you can tick the box saying accept from any host.

4. Go to Nagios Exchange and download check_omsa_snmp.pl to your Nagios server into /usr/local/nagios/libexec (default config assumed)

5. chmod +x that file and test your connection to the SNMP enabled server using snmpwalk -v 2c -c COMMUNITYNAME HOSTNAME .1.3.6.1.4.1.674.10892.1.700.20.1.8.1
It should return SNMPv2-SMI::enterprises.674.10892.1.700.20.1.8.1.1 = STRING: "CPU Planar"
SNMPv2-SMI::enterprises.674.10892.1.700.20.1.8.1.2 = STRING: "Ambient"
SNMPv2-SMI::enterprises.674.10892.1.700.20.1.8.1.3 = STRING: "BP Bottom Temp"

If that is successful, run this command
./check_omsa_snmp.pl -H HOSTNAME -C COMMUNITYNAME
and this will return any critical errors on your server, for example
Power Supply 2 is critical

You can then set this up as a service command where you can have the SNMP community set already.

4 comments:

IanClancy said...

Nice post. I have used your post to implement hardware monitoring on our windows dell server. Just one note..
I had to edit the check_omsa_snmp.pl and change the following line

from :
#use lib "/usr/lib/nagios/plugins";

to :
use lib "/usr/local/nagios/libexec";


just something to watch out for if you install nagios from source

Daniel said...

Hi Ian,
I've just taken a look at our Nagios installation and my check scripts run fine with the use lib set "as is"

Which is peculiar!

Good catch though

Unknown said...

Thanks for the post. Helped us get critical data to our nagios install for a server room with a flakey AC unit.

I also had to use IanClancy's path modification. We're running Nagios 3.0.6 on Ubuntu 8.10 server...if that helps anyone else out in the future.

Anonymous said...

New adress for domain
( Our domain name has changed: NagiosExchange.org is now MonitoringExchange.org )

http://www.monitoringexchange.org/cgi-bin/page.cgi?g=Detailed%2F2268.html;d=1