Member Login



Certified Devices

Cisco PIX Firewalls
Cisco ASA Firewalls
Cisco Catalyst Switch 2950
Cisco Catalyst Switch 2960
Cisco Catalyst Switch 3560
Cisco Catalyst Switch 3750
Cisco Catalyst Switch 4500
Cisco Catalyst Switch 6500
Cisco 1800 Series Routers
Cisco 1900 Series Routers
Cisco 2800 Series Routers
Cisco 2900 Series Routers
Cisco 3800 Series Routers
Cisco 3900 Series Routers
Cisco 7200 Series Routers
Cisco Wireless Aironet 1231

Why You Need Depth in Your Monitoring
Written by Stephen Hull   
Tuesday, 19 April 2011 22:35

When I talk to people about the depth of the management packs JaxMP offers, one question I get often get is "Do you think when SCOM 2012 comes out with built-in network monitoring that your management packs will still be necessary?"  My response is a always, "Absolutely!"  I usually followup with "SCOM 2012 may have network monitoring, but it is still very 'thin'.  If you need more information about errors or other specific counters, you need something more than what SCOM 2012 has to offer and no management pack has the depth that JaxMP has."

I think there is a great misconception that all you need is basic interface counters to know how a network device is performing.  This may be true in some instances, but in most, you need the whole picture.  I like to say that when Microsoft release SCOM, they didn't release only a single "Microsoft Management Pack" to cover all operating systems and applications.  There was a reason that the is a SQL 2005 Management Pack, an Exchange 2010 Management Pack, a Windows 2008 Server Management Pack, etc.  You get the picture.  Each product Microsoft has is very unique and can't be defined with a a few counters and monitors.  The problem, when it comes to network devices, is that they are all are slightly different.  Yes, they have basic counters, but they also have much more to look at.  Just as you can't tell how an Exchange 2007 server is performing just by looking at incoming and outgoing bytes on the interface, you also can't tell how your Cisco 6500 Catalyst switch is performing by looking at interface counters.

Here is a very simple scenario that I usually talk about.  If you are in a multi-site environment with WAN links (Serial, either DS1 or DS3), you can watch in/out bytes all day long and it can help you get an idea about capacity planning, but it does nothing for you when you start having intermittent connectivity problems.  This can sometimes present itself as slowness, even though your in/out counters may tell you a different story.  Where do you start to look.  Since you don't have the whole picture of the network device, you have to start elsewhere, such as a server or an application.  If you had more visibility into the network device, maybe you could eliminate it completely or maybe you can pinpoint the problem as the device.  Basic interface counters will not give you the answer.  In this situation, you need specific counters that pertain to the serial link itself and not some counter or monitor that won't help you much.  SCOM 2012 and all other management pack solutions do not have this depth.  Only JaxMP NPM Management Packs can do this.

If the device exposes it, our NPM Management Packs can get it and monitor it.  In the case of a Cisco Catalyst 6500 switch, you can monitor the actual modules and their statuses, the fans, power supplies, and temperatures on not only the modules, but the chassis itself.  In SCOM 2012, you will only be able to discover some of these object and you definitely won't be able to monitor them without some custom management pack authoring.  In the case of a Cisco 2800 series router, you will be able to monitor specifics of the DS-3 (T3) such as line status, slips, errored seconds, delayed seconds, loopback status, and much more.  Everything a telco sees on their end of the circuit, you can see too.  You can watch and track the detailed counters to now only know that is happened, but when it happened.  I can't even begin to talk about how important this is.

The bottom line is that while basic monitoring is good, it is far from a complete set of tools you need to truly troubleshoot network problems.  Sometimes, you need the flexibility and depth to see as much as possible.  This simply isn't an option, even with specialized network monitoring tools like Solarwinds and WhatsUp Gold, without extensive customization.

 
What is the SNMP Trap Enhancer
Written by Stephen Hull   
Friday, 05 February 2010 01:32

If you have ever struggled to understand the cryptic codes within an SNMP Trap message, you know it is a challenge that needs something similar to a “Rosetta Stone” to decipher them.  Worse yet, once you have decoded the message, you might find out it was just a simple notification telling you something like a log file is starting to get full.

While SNMP Traps can be very useful, they need to be understood correctly to be of any use at all.  Some NMS can help to do this and SCOM has the potential to do this.  Out of the box, SCOM can receive SNMP Traps, but it is still the cryptic message that we are familiar with.  You also have the ability to build Monitors and Alerts around these messages, but it can be very time-intensive to create all the Traps you want, much less the ones you really need.

This is what prompted us to start thinking about a solution to this problem.  The result: the JaxMP SNMP Trap Enhancer.

What is the SNMP Trap Enhancer?

The SNMP Trap Enhancer is our solution to the cryptic SNMP Trap message problem.  After many ideas, we came up with a nifty little application (Windows Service) and a Management Pack to help integrate it into SCOM.  The Trap Enhancer can actually be a stand-alone application, if you wish, or it can be integrated into System Center Operations Manager with the Management Pack.

In a nutshell, the Trap Enhancer is a Windows Service that uses the SNMP Trap Service on a Windows Server to read incoming Trap messages, translate them based upon a translation file, and then writes the message to the EventLog, a Syslog server, or both.  If you are also using the Management Pack, it will monitor the EventLog and will generate an Alert if it sees a message from the Trap Enhancer.  The Alert description will contain the EventLog Description, so you are able to easily read the incoming SNMP Trap message and control the Alerts through SCOM.



How does it work?

First, the Trap Enhancer is a Windows Service.  The reason for this is that is needs to run all the time, no matter who is logged in, and it can be controlled very easily.  The Trap Enhancer uses the Windows SNMP Trap Service to listen to incoming Trap messages.  You have to configure the Windows SNMP Trap Service first before the JaxMP Trap Enhancer will work correctly.

The Trap Enhancer will actually read the values in the incoming Trap message and store information about the SNMP Trap.  It uses the OID value of the incoming Trap message to perform a lookup on a translation file.  It then reads the translation file and starts to generate a new message based upon the values it finds.

The translation file, or TIF as we call it, is completely customizable.  We have generated over 1000 TIF files for all Cisco generated SNMP Traps and as many Standard RFC SNMP Traps as we could find.  The contents of the TIF are directly from the vendor’s MIB file.  Since the Trap messages are defined in the MIB, we used it to help us create the content.  The TIF files are text, so they are relatively easy to read and change.  We have also added functionality that allows customers to create their own TIF for any specific device or application they may have.

Once the new message is created, we have several options.  By default, we write the message to the local Event Log.  This allows the Management Pack to read the message and generate an Alert or clear an existing Alert.  Optionally, you can send the message over to a Syslog server if it is important to track the Trap message for other applications or security purposes.



Enter the Management Pack

Now that we have stored this information in the local Windows Event Log, we have the message in the perfect place for SCOM to use it.  The Management Pack is nothing but Monitors and nearly all of the Monitors do nothing but read the Event Log.

When the Monitor sees a message from the Trap Enhancer, it generates an Alert.  The Alert actually contains the Event Description, so you can read exactly what the Trap Enhancer generated in the Event Log without looking at the actual event on the server.

All Monitors are disabled by default, but we built an Override MP that enables a few Monitors, to help users get started: Monitors such as the SNMP Trap that gets sent when someone saves the configuration on a Cisco device or a notification of a network device that has been reloaded.

What About SNMP Notifications?

When an SNMP Trap is generated, there may no be a corresponding SNMP Trap to clear the state.  This is usually true for Notifications.  A notification is a message that tells you something happened, not necessarily some GOOD/BAD event.  An example of this is when there is a reload of a device.  A reload message is generated, but there is no corresponding message that will clear this message.  What typically happens in SCOM 2007 is an SNMP Trap is sent for the device and an Alert is generated.  The Alert must be manually cleared because there is no corresponding event you can use to clear the Alert.  In other words, the Alert gets generated, but it must be manually cleared.

To fix that in the Trap Enhancer Management Pack, what we have done is to use a timer to clear the Alert after a set amount of time.  These timers are individual for each SNMP Trap and can be overridden for each device.  Our defaults are for 24 hours, but this will usually be changed depending upon the SNMP Trap that is generated and the device it is generated from.

Performance:

I am sure a pressing question is: What kind of impact will this have on my server?  Well, we have performed some stress tests on the Trap Enhancer Service and the results shocked us.  The highest level we pushed the Trap Enhancer was about 30 Traps/Second.  This is an extremely high rate of messages and would probably only be found in very large networks, if ever.

In all cases, the most interesting finding we made was that we could push the Trap Enhancer to the brink of failure and we could make it fail.  What is interesting about it is that it actually wasn’t the Trap Enhancer Service that ended up failing.  It was the Windows SNMP Trap Service.  We were pounding the SNMP Trap Service so hard, it would crash and stop receiving trap messages.  All we would do is restart the Windows SNMP Trap Service and the Trap Enhancer Service would continue translating all incoming Trap messages.

What Lies Ahead

Even before the Trap Enhancer Service and Management Pack was finished, we started looking to the future and the enhancements we wanted to make.  Some enhancements on the horizon are:

  • Make TIF file creation and integration cleaner and easier
  • TIF design and testing tools
  • Convert TIF file format to XML
  • Build APIs into the Trap Enhancer Service
  • Add Performance Counters to the Trap Enhancer Service
  • Allow for non-TIF content to be used for translations
  • And many more.

The SNMP Trap Enhancer Service was built with a very specific purpose in mind: to take the messy, hard to read, cumbersome information from our network and make it presentable, readable, understandable, and easy to use.  Even though the Trap Enhancer Service is a major component to the solution, the complete user experience would not be possible without System Center Operations Manager.  Now we can take this information and do something useful with it, like manage networks efficiently and effectively.