Archive for July 2013
I’ve worked with many different network engineering departments at many different companies and I must say one of the biggest trends I typically see is the fact management capabilities are typically always lacking, and usually it is due to one of the following reasons:
1. A complete lack of management tools, while this is usually the rarest issue out there, there are some places that don’t even rely or have any type of network management tools and you see some type of excel spread sheet or network share containing copies of device configurations. Now there is nothing wrong with this especially if you are a real small environment however it is definitely not ideal for larger environments and should be avoided.
2. Outdated network management tools, this is only somewhat better then not having any type of management tools. That is relying on tools that have been EoL for years, to the point you either need to maintain the network management application or worry about it failing. As with any type of network device the network management software needs to evolve with the network, as more and more technologies are rolled out to the network you need to ensure the management of those technologies scale just as well.
3. Too many network management applications, while you wouldn’t think this is a bad thing. It can be very easy to get carried away with network management. For example look at Cisco, they practically have a flavor of ‘Prime’ for everything CX-Modules, Wireless networks, wired networks, Voice/Video, which in itself can get overwhelming because usually on top of those platforms are additional platforms for configuration or performance management (whether it be SolarWinds, PRTG, WhatsUpGold) and your management turns out to be very de-centralized sometimes leading to confusion in itself and in some cases causing companies to purchase duplicate licensing that they don’t need.
4. Not knowing what to actually monitor. Granted efficient management techniques come over time and experience. to be honest typically the first time many people setup any type of NMS they are instantly ‘wowed’ at the sheer amount of information they get by default (typically historical performance information, NetFlow stats, configuration management) that they do not realize what they don’t see until they find themselves in a troubleshooting situation or outage and begin wishing they had just a little bit more information. For example look at SolarWinds NPM only recently did it start adding support to viewing routing tables and see routing neighbors, in the past custom pollers would have to be setup to see this type of information. However you still need to rely on custom pollers to pull specific MIBs for FHRP status, which in my mind is just as important as monitoring a routing protocol.
Now, we do have a very large arsenal of tools to choose from when designing our network management environment and it can be intimidating at first, but the important thing is to understand what we ‘should look at’ depending on the situation we are attempting to troubleshoot. A few great tools are:
Historical performance records are always great, since those type of tools will passively (and automatically) establish a baseline for us allowing us to quickly determine if a network device or segment is experiencing any abnormal performance.
Syslog/traps, remember syslogs and traps are basically the equivalent of error logs in the Windows event viewer and are able to quickly tells us if the router is experiencing any type of issue. Of course logging needs to be properly configured and possibly filtered to ensure the logs give us the information we need to see quickly without having to filter thousands of events!
NetFlow data is an amazing resource especially when teamed up with NBAR these can quickly tells us what traffic types and patterns are going through our router, so let’s say a particular remote site is experiencing performance issues NetFlow and easily tell if we have some specific traffic over utilizing the bandwidth or flooding the interface.
Configuration management, while this one is a given for any large network it can also be used to quickly identify any network changes that could be causing any negative impact, and pretty much all of the configuration management tools out there today include the ability to automatically compare previous configuration sets highlighting the differences.
Software management, you might not consider this one at first, but knowing what type and version of software is running in your network is a very important aspect to be aware of, especially if you are unlucky enough to stumble upon and a bug within the software. In those events you want to be able to quickly identify what other devices in your network will be affected by this software bug and you will also in turn want a simple and manageable way to upgrade and replace that software.
Yep, you read that correctly. You can now route on the low-end layer 2 Catalyst 2960 switches (Sounds like one bad oxymoron right?). This feature was introduced in IOS 12.2(55) and requires LAN Base which has been around since late last year, however it is not a very known feature which shocks me! I figured the addition of routing (Albeit limited routing functionality) on a 2960 switch would have been some great news!
Now don’t go expecting to run OSPF or EIGRP on a 2960, in fact it does not support any routing protocol, so your natural reaction is going to be then what is the point? Well it supports inter-vlan routing and up 16 static routes. Remember one of those static routes can be a default route up to a distribution switches’ HSRP address (or VSS Core) allowing you to implement a routed access layer for cheap!
Now in regards to the configuration, once you have 12.2(55) or newer loaded on your 2960 you will need to make sure the switch is running the proper SDM (No, not Security Device Manager for those that are unfortunate to remember it) Switching Database Manager. If you are not familiar with the SDM templates on Catalyst switches they are definitely worth a look at it, especially since the SDM template instructs the switch how to curve up resources to the TCAM. (IE: Mac tables, routing tables, unicast/multicast, QoS, etc -Obviously not all of those pertain to the 2960). The Catalyst 2960 now has the option for ‘lanbase-routing’, which is the SDM we need to enable.
Note: When we change the SDM the switch requires a reboot for the new SDM template to take effect because it changes the resources allocated by the TCAM.
If you change the SDM and do not perform a reload your changes will not take effect and if you issue sh sdm prefer again the switch will tell you which SDM the switch will load upon next reload.
Ok, now that we have the proper SDM loaded on the switch (lanbase-routing) we need to enable ‘ip routing’ on the switch:
Now, that ‘ip routing’ is enabled we can go ahead and view the routing table of the Catalyst 2960!
Now, there you have it routing on a Catalyst 2960 the important thing is to remember the limit of 16 static routes. So I put this to the test and added over 20+ static routes:
I placed 22 routes into configuration mode and after #16 the switch silently discarded the rest of the routes. Something else I found pretty interesting is the fact my other VLAN interface disappeared from the routing table (172.16.1.0 /24 which is in the previous screen capture, which is a connected route!) So this feature really is limited but it is there none the less.
In this post I was running 12.2(58) on one of the 2960 switches in my lab C2960TT-L I believe, I was able to place a client one vlan and ping across to another two VLAN attached to the 2960 with no other routing device in the path.
NOTE: WordPress is distorting my images so until I figure out why, all the screen shots in this post are meduim/thumbnail size and can be viewed in full size when clicked on.
Well, I just realized the SolarWinds certification test was available free of charge so the other day I decided to give it a shot, I figured it would be a nice small break from my CCIE studies. I never thought I would bother getting a certification for a management platform but considering I’ve been working with it for years, I figured why not.
Just to give you a brief overview of my experience with SolarWinds:
- I’ve been working with SolarWinds hands-on for a least 5 years now.
- I’ve done at least 3-4 installations from the ground-up. Not just install and hit next, planning out and designing the system to manage a few thousand nodes.
- Deployed and managed various different SolarWinds modules, along with performing the upgrades – NPM, NCM, IPAM, NTA, APM/SAM, Fail Over Engine. (And if you have had to plan for an upgrade for an outdated SolarWinds environment running more than 3 modules, it’s fun)
- Created countless user accounts, custom dashboards, custom reports, customer pollers, views, limitations, and so forth.
- Basically you name I’ve done it within SolarWinds. (Well not really considering how quickly SolarWinds expands their platform but you know what I mean)
Now to talk about the exam itself: (Note, I am going to give away any details that can’t be found on SolarWinds’ own website)
- Free of charge (for now).
- It’s online, meaning you can take this from the comfort of your couch.
- Around 80 or so questions. So it’s not that short.
- Covers a wide array of topics from:
- How to perform NPM tasks
- What tools to utilize when troubleshooting
- Some basic troubleshooting steps
- and more.
Now, for my thoughts on the exam. All around the exam was not that bad, for as long as I have been a network engineer and as long as I have worked with SolarWinds there were a few questions that had me stumped, which honestly surprised me I didn’t think I was going to miss 10-12 questions so it just goes to show you, even though the test is about a management platform or the fact it is free it is not what I would call a push-over. Now don’t get me wrong there were some question and some answer choices that were just gimmes, but usually you can find a few of those on every test. Now since the test is online and you do not have to go into a testing center it is considered ‘open book’ meaning you can have the test open in one window and the admin guide open in another window, which may hurt the value of the exam. As far as myself I didn’t even bother putting forth the effort to read the admin guide (again), I figure if I couldn’t pass the exam with my SolarWinds experience either there is something wrong me or the exam.
Now I’d venture to say this exam is a good measure for those that have been doing network administration for at least 3 three years, with SolarWinds exposure, now whether or not this exam/credential gains popularity is another story. Just remember this is centered around managing/monitoring a network, not how to troubleshoot and diagnose SolarWinds application/DB/Web issues. I will say the whole niche of network management (all aspects) is usually the most overlooked functionality of many networking departments. I would not mind seeing that change, just as I would not mind seeing this SolarWinds test gain in popularity and become a test that requires you to sit in a testing to take. It will be interesting to see where SolarWinds takes their certification surely they have potential to expand it to their other modules and even a ‘design’ designation due to how the architecture can change when you start involving EoC and splitting up modules/roles but time will tell.
For years I’ve been a fan of the reload in command it has always been a useful safety net when making changing that could essentially remove my ability to manage the router. Only recently have I found a feature that will actually roll back the configuration changes I make during a session without the need to reload the router! I don’t know about you but this is just an awesome feature due to the fact it is much less intrusive than the old way of reloading a routing and waiting for it to boot back up. Let’s quickly review this feature:
First we will need to configure a configuration archive, this is actually a prerequisite of the feature want to utilize for reverting our configuration.
The above configuration simply does the following:
- Keeps a copy of the configuration backup on the local flash card with the directory ‘Archives’
- Keeps the last 8 copies of the configuration
- Takes a copy of the configuration when it is saved (Either using the wr mem command or copy run start)
- The configuration will also be saved automatically every 525600 minutes. (This is entirely optional I just included it)
And I used the following commands:
Now, that the configuration is enabled we can start using this configuration revert feature.
To use this feature all you have to do is use the following command when entering configuration mode config t revert timer x once you enter this command it take a backup of the current configuration of the router and places you in configuration mode:
If you try to utilize this feature without first configuring your config archive:
Now you can make any changes that you need. If you do not confirm your changes when you are finished the configuration will be rolled back to the snapshot taken.
You will want to enter the command config confirm to keep the router from rolling the configuration back assuming the change was implemented successfully:
I would like to add, it is possible to enter config with the revert feature, make your changes save the configuration to the start-config and then not confirm your changes. This will cause the running-config to revert back to its previous state but the startup-config will contain any changes made. So you have to be careful with this feature.
Now, I’ve been trying to beat this feature up in my lab all day, and so far it has not been perfect and I’ve seen some errors rolling back for some Frame-Relay configurations. I’ve went as far as to upload a router config from one of INE’s labs and then enter configuration mode with the revert feature enable and then post an entirely different router’s configuration over the existing configure just to see how the revert feature works. So far with the exceptions of some frame relay features it has been solid. It looks like I will start incorporating this new method into my normal day-to-day operations now I depending on the change I might still put in a reload in since you can’t be too careful especially if you are working devices that are in remote datacenters or located physically across an ocean. At least until I start feeling more confident with this feature.