CCIE or Null!

My journey to CCIE!

Posts Tagged ‘TCP Flags

Wireshark tid-bit: Looking for the smoking gun

leave a comment »

It’s not always the network which is at fault but it is usually the network that is the assumed culprit. Wireshark can be a vital tool when troubleshooting difficult scenarios or prove that the network is not root cause. The purpose of this post is to ask questions and promote out of the box thinking when troubleshooting with Wireshark. Some common problems:

  • The network is slow / This file transfer is slow
  • My application responds after X seconds
  • This application or agent is not working

These are very broad problem statements and depending on how familiar you are with the configurations and infrastructure, these can sound like daunting tasks to troubleshoot. At least until we start asking more questions and begin gathering the information we need to start troubleshooting. On occasion it is easier to capture packets on the source and destination endpoints then see what the packets tell us, will the packet capture align with the problem or will the packets point us in another direction?

When using Wireshark to troubleshoot such a board and undefined problem, where do you start? Wireshark is great at collecting details, Wireshark also presents all this information to us in an organized and digestable fashion. It’s up to us as network professionals to solve the mystery. Next up we’ll discuss some key items that will help expedite the troubleshooting process and make you more comfortable performing and reading packet captures. We will also ask some thought provoking questions to involve not just network-related technologies and issues.

Capture points and filtering options

The capture point is the foundation of your upcoming analysis. If you are not capturing at the correct spot (or spots) any future protocol analysis will either make the troubleshooting process more difficult and time consuming or you may find yourself going down the wrong path. It is always a good idea to review and understand the traffic flow which you are troubleshooting. Then ask yourself, will traffic be passing through my capture point? Could this traffic flow take another network path? Could another technology such as Virtual Port-Channel (vPC), Virtual-Chassis, load-balancers, load-sharing, ACLs (it can become quite a large list) be causing traffic to flow in another direction or drop the traffic flow all together? Depending on the size of the infrastructure is it possible to walk the path of the packet, following MAC, ARP, and routing tables? 

Capturing at the right point is half of the battle, the next big question is what kind of capture filter do you need? Should you just capture everything passing through the interface in question? Some items to consider:

  • If you perform an unfiltered capture and capture all the details:
    • Will you create a performance bottleneck? Packet capturing can be resource intensive impeding network performance for production traffic flows.
    • Will the packet capture be usable? The larger the packet capture the more time it takes to filter and analyze. You need to start filtering the conversation at some point either pre- or post- capture. Having all the data is great but will it be helpful?
  • If you do decide to filter your capture what is your criteria?
    • Source and Destination are always valuable but will that show you the entire picture? Will you see DNS resolution? Do you need to be aware of any HTTP redirects?

Timestamps

Being mindful of the timestamps logged in a packet capture, this information can help identify a wide array of different issues. Wireshark captures multiple timestamps, like almost every other field in the packet you can turn that field into a column this allows you to quickly identify gaps in time. You can take this a step further by marking a ‘reference’ packet allowing you to see the time delta from the reference packet. This can be useful when determining how long a requests takes or how long it takes a TCP handshake to complete.

e.g.: Why did the application server take 32 seconds to start transferring data? What was occurring during that 32 second gap? Perhaps there was a delay in receiving traffic from a backend database? Did you capture the backend flow too? Typically when a database is waiting on a transaction to complete you may see ‘keep-alive’ packets allowing the TCP connection to remain open/active while backend tasks are being performed. If you are troubleshooting a performance issue and see ‘keep-alive’ packets your next step may be to understand why the request is taking so long. Continuing with the database example, does a query need to be updated? Is the database fragmented? Does the server need additional resources (CPU/Memory/Disk/etc)?

Source and destination details

Don’t discount the source and destination details of the conversations. These details allow you to confirm the traffic is flowing the way you expect it to be going.

  • MAC address – Does the source and destination MAC addresses line up with the expected network interfaces? Could the traffic be going a different path? Is there a misconfiguration causing traffic to flow over a different interface?
    • I have seen colorful packet captures when LAG is enabled on the server-side but not on the network side. (This was not 802.3ad LACP)
  • Dot1Q and IP details – Are you seeing the expected VLAN tags? Are any device dual-homed on multiple networks and could traffic be routed over an unexpected or different interface?

A packet capture can clearly show you what is happening, once you see what is happening you can then ask the question: Why is this happening? e.g.: A device receiving a packet on one VLAN but the response is received on a different VLAN, why is the response appearing on different network than which is was received?

Firewalls for instance allow you see packets getting ‘dropped’ or hitting the ‘bit bucket’, if you capture on both the rx and tx you expect to the see the same Dot1Q information, however if you find packets in the drop log with a different Dot1Q header than expected you can assume asymmetric routing and that may lead you to your root cause.

IP and TCP Options:

IP and TCP options provide an abundance of information about both the endpoints and the conversation:

  • IP Length – The size of the packet can help identify packet loss involving MTU and fragmentation related issue.
  • Fragmentation – What is the status of the DF-bit? Are overlays in play? How about Jumbo frames? Do you see certain packet size being transmitted from the source but not arriving at the destination? Do you need to move you capture point to identify the source of packet loss?
  • Window Size and MSS – Much less a problem these days but does the TCP session have the appropriate TCP MSS and Window size are these setting being manipulated by a load balancer?
  • IP Identifier – The IP ID field can be invaluable in the right scenario, especially when troubleshooting switching loops, routing loops, or even software bugs. I’ve done a previous write-up of the IP ID field and therefore won’t be going over the details again but I recommend giving it a quick read before finishing up here.

TCP Flags:

How is TCP behaving? Are you seeing the entire 3-way handshake? Besides capturing at the proper locations you also want to make sure your timing is correct, starting your capture before handshake occurs is vital to understanding the whole picture. The TCP handshake will include items discussed in the previous section such as the Windows Size and MSS, following the window size of a conversation may provide insight into packet loss or buffer issues on one of the endpoints. How about RST packets? It’s important to understand the application, some applications may not be coded properly and rely on RST packets to end connections as opposed to graceful closures with a FIN / FIN+ACK handshake. The last thing you want to do is go down the wrong path and assume something is wrong when it is indeed working as intended. (it may not be optimal but it may be as intended)

Protocol Dependancies

I briefly touched on this earlier in the post, it is good to see the full picture and follow the application / process in its entirety for example:

  • When troubleshooting a connection issue are you are capturing the DNS resolution too?
  • If you troubleshooting IP Storage or IP File access are you seeing any Kerberos or ticket authentication?
  • Are you seeing a successful TLS handshake if you are working with a secure connection?
  • In a security-conscience world do you see the RADIUS communication when troubleshooting device onboarding issues?

This post asked a lot of questions, as mentioned at the start this was aimed to provoke that out-of-the-box thinking: Am I troubleshooting this properly? Am I looking at the right pieces? Is there more to this issue than what I am considering? Hopefully some of these questions and examples did just that. I have other Wireshark posts on this blog, many of which are still valid after all these years, some stand-out ones to assist with troubleshooting:

As always, happy hunti…I mean troubleshooting!

As you can tell by my typos, grammar, sprinkled in references, and humor this post was written without the help of Artificial Intelligence (AI). These are my loosely organized ramblings.

Written by Stephen J. Occhiogrosso

September 5, 2024 at 11:15 AM