Network Management a forgotten art?
I’ve worked with many different network engineering departments at many different companies and I must say one of the biggest trends I typically see is the fact management capabilities are typically always lacking, and usually it is due to one of the following reasons:
1. A complete lack of management tools, while this is usually the rarest issue out there, there are some places that don’t even rely or have any type of network management tools and you see some type of excel spread sheet or network share containing copies of device configurations. Now there is nothing wrong with this especially if you are a real small environment however it is definitely not ideal for larger environments and should be avoided.
2. Outdated network management tools, this is only somewhat better then not having any type of management tools. That is relying on tools that have been EoL for years, to the point you either need to maintain the network management application or worry about it failing. As with any type of network device the network management software needs to evolve with the network, as more and more technologies are rolled out to the network you need to ensure the management of those technologies scale just as well.
3. Too many network management applications, while you wouldn’t think this is a bad thing. It can be very easy to get carried away with network management. For example look at Cisco, they practically have a flavor of ‘Prime’ for everything CX-Modules, Wireless networks, wired networks, Voice/Video, which in itself can get overwhelming because usually on top of those platforms are additional platforms for configuration or performance management (whether it be SolarWinds, PRTG, WhatsUpGold) and your management turns out to be very de-centralized sometimes leading to confusion in itself and in some cases causing companies to purchase duplicate licensing that they don’t need.
4. Not knowing what to actually monitor. Granted efficient management techniques come over time and experience. to be honest typically the first time many people setup any type of NMS they are instantly ‘wowed’ at the sheer amount of information they get by default (typically historical performance information, NetFlow stats, configuration management) that they do not realize what they don’t see until they find themselves in a troubleshooting situation or outage and begin wishing they had just a little bit more information. For example look at SolarWinds NPM only recently did it start adding support to viewing routing tables and see routing neighbors, in the past custom pollers would have to be setup to see this type of information. However you still need to rely on custom pollers to pull specific MIBs for FHRP status, which in my mind is just as important as monitoring a routing protocol.
Now, we do have a very large arsenal of tools to choose from when designing our network management environment and it can be intimidating at first, but the important thing is to understand what we ‘should look at’ depending on the situation we are attempting to troubleshoot. A few great tools are:
Historical performance records are always great, since those type of tools will passively (and automatically) establish a baseline for us allowing us to quickly determine if a network device or segment is experiencing any abnormal performance.
Syslog/traps, remember syslogs and traps are basically the equivalent of error logs in the Windows event viewer and are able to quickly tells us if the router is experiencing any type of issue. Of course logging needs to be properly configured and possibly filtered to ensure the logs give us the information we need to see quickly without having to filter thousands of events!
NetFlow data is an amazing resource especially when teamed up with NBAR these can quickly tells us what traffic types and patterns are going through our router, so let’s say a particular remote site is experiencing performance issues NetFlow and easily tell if we have some specific traffic over utilizing the bandwidth or flooding the interface.
Configuration management, while this one is a given for any large network it can also be used to quickly identify any network changes that could be causing any negative impact, and pretty much all of the configuration management tools out there today include the ability to automatically compare previous configuration sets highlighting the differences.
Software management, you might not consider this one at first, but knowing what type and version of software is running in your network is a very important aspect to be aware of, especially if you are unlucky enough to stumble upon and a bug within the software. In those events you want to be able to quickly identify what other devices in your network will be affected by this software bug and you will also in turn want a simple and manageable way to upgrade and replace that software.