Saturday, April 13, 2013

Broadcasts Only?

This last week I was called on an issue involving a specialized device that had lost communication with the network.  I am going to write this up in the same order in which we found more information and then ultimately came to the solution.

But first, a little background on the device.  The device is on a secured network behind a firewall and uses a statically assigned IP address (no DHCP).  However, the device does use broadcast packets and unicast packets to communicate with other devices on this network (sorry that I can't give more details about the devices themselves, but that's how it has to be).

So, let's start with our network topology using dia (http://dia-installer.de/).  This is just the physical topology using my home lab equipment, but this will be enough to show the issue and the solution.  There are three switches which are connected in succession.




On this network we have three VLANs:  one for the switch management, one for the typical end users and one for these devices.  We will use VLAN 1 for the switch management, VLAN 52 for the end users and VLAN 57 for the devices.  The device having the issue will be located on VLAN 57.

So, one of my first steps is to always check the interface (#show int fa 0/8),


 check the port security (#show run int fa 0/8),


 and the port's configuration (#show run int fa 0/8).


The port interface is up and no port security violations are occurring.  Typically at this point, I clear the interface counters to see if packets are flowing in both directions (#clear counters fa 0/8).  Then, another look at the interface statistics (#show int fa 0/8).


Notice that the only type of packets coming from the device to the switch are broadcast packets since the counters are identical (13 packets inputs).  There is also some traffic leaving the switch going to the device (9 packets output).

So at this point, I'm thinking that the device is misconfigured since these devices are using static IP addresses, maybe the device has the wrong subnet mask or default gateway configured.  Unfortunately, I forgot the phrase about assumptions and I made an 'ass out of me'.   Since I spend most of my work time at the access level (Layer 1 and Layer 2 with a sprinkling of Layer 3), I was really focused on the local connection, not the upstream connection.

After the end user put another known good device on the same port, it still had the same problem (could not communicate to other devices on the same subnet).  At that point, I started looking at the trunk links.

Ran a #show int trunk to look at the allowed VLANs and VLAN 57 was allowed:


Next, a look at the spanning-tree instance for VLAN 57:


No problems there.  Fa0/2 is the trunk link up to the next switch and is the root port (the path to the root switch which is BackboneSwitch).

Time to move up to the next switch to see what is going on there.

First, the same #show int trunk command to make sure the correct VLANs are allowed:


Same deal here.  VLAN 57 is allowed in both directions (down to Switch2 and up to BackboneSwitch).

Next, a look at the spanning-tree instance for VLAN 57:


Now, are you seeing what I'm seeing?


Switch1 should not be the spanning-tree root for any VLAN; BackboneSwitch should be the root.  However, the VLAN is allowed on the trunk up to BackboneSwitch.  So, what is the problem?

Time to go to BackboneSwitch and check the configuration there.

So we run the #show int trunk command one more time and find:


VLAN 57 is not allowed on the trunk link.  Somehow (and after checking the logs, we still don't know how or why) VLAN 57 was removed from the trunk going to the two downstream switches effectively cutting the VLAN from the rest of the network.

For a VLAN to be active through the whole topology, the VLAN number must be allowed on both sides of the trunk link.

So a quick change to the interface (config-if)#switchport trunk allowed vlan 1,52,57

The VLAN was reconnected to the rest of the network, spanning-tree converged and the device was back on the network communicating with the other devices on the subnet.

This same type of packet flow (all broadcasts but unable to grab a DHCP lease or communicate with other devices) will also happen if the port is added to a VLAN that either doesn't exist on the switch or if that VLAN is not allowed out the trunk.

So, the moral of this story:

Connectivity issues are not always at the device end.  It may be an issue two or three switches upstream that is the cause of the problem.

Unfortunately for me (since I spend most of my troubleshooting time at the device end), it took me way too long to start checking the entire path.  I was convinced that the problem was with the device, not the switch configuration.  My customer was frustrated with me and I am still kicking myself for wasting my customer's time and for not figuring out the issue quicker.