Saturday, October 4, 2014

Always Know Where the Default Gateway Resides

The other day I was assigned the task of replacing two switches on our wireless network. Our wireless network is completely separate from our wired network and we typically set up our network in this style of topology:


We use a backbone area that is a flat Layer 2 topology to the wireless controller.  We use Layer 3 switches at the access layer to create a separate VLAN and IP address space for the wireless access points.  The access layer switches run OSPF and include a default route.  The route to the access points is advertised via OSPF.

However, in the building that I was replacing switches in the topology was closer to this:


VLAN 44 used Layer 2 back to the core Layer 3 switch while VLAN 55 used a Layer 3 switch like our usual design.  At least, that's what I thought.  As we'll see, the two assumptions I made carried some consequences.

-

I've duplicated the behavior I witnessed on my home lab equipment.  The actual topology where I work was larger but this set up will show what I ran into.  I used 2) 3550s, 2) 2950s and 2) 1721s.

The second 1721 router in the lower left corner is being used as an end-user device in order to keep the VLAN in an 'up' state.


-

VLAN 11 is used as a backbone VLAN for the switches to connect back to the R1 router.

VLAN 22 is used for the access point VLAN, but I made the assumption that R1 was the default gateway.  I was tasked with replacing SW1 and SW3 with newer models.  I configured the SW1 to behave per our usual design (only the backbone VLAN is allowed out of the switch and the access point VLAN stays local to the switch and OSPF is used to advertise the access point network).  SW3 was unique that it used two VLANs for the access points [I've only set up one on this topology].  I never looked at the config on SW2 before replacing the switches.

VLAN 33 is a different backbone VLAN that is used only between R1 and Dist-Switch.

I replaced SW3 and all of the connected access points came back online and everything was working as expected.

I replaced SW1 and those access points came online.  However, the access points connected on SW2 went offline!  But the access points on SW3 were still online.

So, I logged into SW2 and tried pinging the the VLAN 22 gateway of 22.22.22.1:


And the pings failed, which surprised me since I assumed that R1 was the gateway.

I logged into SW3 and tried the same thing:


And the pings succeeded!  At this point I was really confused.  How could pings fail from one switch but succeed from another?

After much digging around (nearly an hour!), I was able to get a hold of the engineer and he found some documentation that said that SW1 was actually the gateway for VLAN 22!

When I installed SW2, I configured the switch to only allow VLAN 11 (#switchport trunk allowed vlan 11).  This configuration cut off Layer 2 access to the VLAN 22 gateway.



I added VLAN 22 to the trunk and the rest of the access points immediately came back online and service was restored.

---

But how were the access points on SW3 able to reach the VLAN 22 gateway?

I was able to replicate the configurations and recreate the problem here:

On R1, we have a router-on-a-stick configuration with the three VLANs.


Note that fa0.22 is not .1 (the usual default gateway).

-

Dist-Switch is a pure Layer 2 switch (no routing).  It uses an SVI on VLAN 33 and uses R1's fa 0.3's subinterface as the default gateway.


-

SW1 is still configured incorrectly (only VLAN 11 is allowed on the trunk link to Dist-Switch).  SW1 is running OSPF with R1 and uses VLAN 11 to reach R1.  Access point VLAN 22 is advertised to R1 and we'll see that plays a big role in the weird behavior.


-

SW2 is configured as a Layer 2 switch (even through it is a 3550).  This switch uses VLAN 22 (the access point VLAN) as its' default gateway.  At this point, 22.22.22.1 (located on SW1) is not accessible at Layer 2 since VLAN 22 is not allowed on the trunk link between SW1 and Dist-Switch.


-

SW3 is a Layer 2 only switch as well.  However, it was configured to use VLAN 11 (the backbone VLAn) and 11.11.11.1 (R1's fa 0.1 subinterface) as its' default gateway.


The differences in the default-gateway configuration was the reason why SW2's access points lost connectivity, but SW3's access points remained online.

-

On SW2, VLAN 22 and 22.22.22.1 were the exit point out of the VLAN.  Since I configured SW1 to not allow VLAN 22 on the trunk, 22.22.22.1 was not available from a Layer 2 perspective.



On SW3, it was using VLAN 11 (the backbone VLAN) and was able to reach R1.  When I sent pings from SW3 to 22.22.22.1 (SW1), the traffic went from SW3 to R1 to SW1.  




Since SW3 advertised 22.22.22.1 to R1 via OSPF on VLAN 11, R1 is able to forward the packets and receive a response from SW1.


Note that 22.22.22.0 is reached via 11.11.11.11 (SW1) on fa 0.11.

That explained why I was able to ping SW1 from SW3 (Layer 2 worked from SW3 to R1 and then the packets were sent by Layer 3 through R1 via OSPF), but not able to ping SW1 from SW2 (default-gateway was on VLAN 22 and only accessible by Layer 2 and the gateway was cut off from the rest of the VLAN because the VLAN was cut off of the trunk on the SW1 end).

--- 

The lessons learned:

Make sure that you know which device is the default gateway for each VLAN before replacing hardware or changing configurations.

If a VLAN is used by multiple devices via trunking, make sure the trunk is allowed on both ends of the link.

Ultimately, we'll have to re-engineer these access point VLANs so that the same access point VLAN is not used on multiple access switches (per our usual design).

Sunday, July 27, 2014

OSPF Stub versus Totally Stubby Area Design Considerations

Recently, I couldn't sleep, so (naturally) my mind started wandering and I started thinking about how OSPF areas would converge if there are multiple links out of the area.  So, I drew up a quick lab and I'd like to share it with you.

Here is my hand-drawn diagram using my Wipebook:

On the link 10.2.3.0 /30, R3 should be .2, not .3.  Sorry for the typo.


And here is a much more legible version using Dia:


I chose to run this lab on my home lab instead of GNS3 since I wanted to add a few Layer 2 switches.

The equipment in this lab:

2) 1721 routers (R1 and R2)
1) 2621 router (R3)
2) 3550 switches (SW1 and SW2)
2) 2950 switches (AS1 and AS2)

I'm using serial connections between all of the routers (10.1.2.0, 10.1.3.0 and 10.2.3.0).  Ethernet connections on all the links connected from SW1 and SW2.

As you can see from the diagrams, the way that Area 11 extends to the routers and Area 22 is only extended to the first Layer 3 switch.  The reason I did this was to make the Link State Databases different on the Layer 3 switches.  From here we are going to change the area types and watch how the routing tables change on SW1 and SW2.  We'll also consider how traffic flows are going to change as we change the area types.

I cranked up the speed on the serial connections to 1,000,000 Kbits and left the Ethernet connections at 100,000 Kbits (FastEthernet) that way the links between the routers are the faster paths and I also changed the OSPF auto-cost reference-bandwidth to 1000 Mbits.  This way the preferred links will be the serial links.  Additionally, I've set R3 as the designated router for both of the Ethernet networks (10.13.1.0 and 10.23.2.0).

Since most of the changes we make will affect SW1, let's start with a baseline of SW1's OSPF database and SW1's IP routing table.

SW1's OSPF database:


SW1's IP routing table:


SW1 has all of the links and is load balancing when possible (to 2.2.2.2 and 10.1.3.0).  Also, note that there is no gateway of last resort (default gateway) on this topology.

Let's head over to R3 since that is at the center of the topology.

R3's OSPF database:


R3's IP routing table:


So, what are the different types of areas that Areas 11 and 22 can be?

Stub Area
Totally Stubby Area
Not-So-Stubby Area (NSSA)
Totally Not-So-Stubby Area (Totally NSSA)

Since I won't be adding an Autonomous System Boarder Router (ASBR) to this lab, let's focus on stub area versus totally stubby area.

Stub Area

Let's start with a stub area.  In a stub area, the Area Boarder Routers (ABRs) inject a default route into the area.  

Under the OSPF configuration on R1, R3, and SW1, we add the line: area 11 stub

Let's look at the IP routing table on R3:


No changes are made to R3's table.  Let's take a look at R3's OSPF database:


The main item to notice is the addition of 0.0.0.0 from 1.1.1.1 and 3.3.3.3 into Area 11.  Let's see how SW1 is viewing Area 11 now.


Here we can see the addition of 0.0.0.0 to SW1's OSPF database as well.

SW1's IP routing table:


Here we see that the gateway of last resort has changed to 10.13.1.2 and that 0.0.0.0/0 is load balancing between 10.13.1.2 and 10.13.1.1.  SW1 still has all of the links in the routing table since all of the links are part of the same OSPF network.  By changing Area 11 into a stub area, all we have done is added a load balanced default route from SW1 to R1 and to R3.

Totally Stubby Area

A totally stubby area removes all of the interarea entries from the area and relies on a default route for traffic leaving the area.

To configure this in this lab, we add the line: area 11 stub no-summary to R1 and R3.  We can add the line to SW1 as well (just for consistency), but this command only needs to be added at the ABRs since they are the devices that will be injecting the information into the area.

Starting with R3's OSPF database again:


The Summary Net Link States (Area 11) is reduced to only a default route.

R3's routing table is unchanged.

Over to SW1 now:


Just as R3 reported, the Summary Net Link States (Area 11) is reduced to a default route.

SW1's routing table:


SW1 now has a dramatically reduced routing table.  The table now holds the three directly connected networks and then the load balanced default route to R1 and R3.

But before we do a dance of joy, what does having a load balanced default route really do for Area 11?

In this case, it means that all traffic leaving SW1 to anywhere outside of Area 11 will be load balanced (alternating packets) to R1 and to R3.  Is that a problem?  Sort of is, in my opinion.

Let's do a traceroute from SW1 to 1.1.1.1:


Load balanced between R1 and R3 as we expected.  However, 1.1.1.1 is directly connected to R1.  So, packet 2 needed to take an extra hop from R3 to R1 via 10.1.3.0 /30.

Going to 2.2.2.2, this is not as big of a deal since R1 and R3 have a direct link of equal bandwidth to R2.

Things get a little weird when we try to send data from SW1 to SW2.

First, traffic is load balanced to R1 and to R3 just as before.  But when we look at the routing table of R1 to 5.5.5.5, we have load balancing there as well since the link costs are the same from R1 to SW2 (via R3 and via R2).


If we had a small flow of 16 packets that needed to be sent from SW1 to SW2, 8 packets would be sent to R1 and 8 packets would be sent to R3.  Of the 8 packets sent to R1, 4 would go to R2 and 4 would go to R3 (and those 4 packets would have extra latency having gone to R1 first).  SW2 would receive 4 packets from R2 and 12 from R3.

If Area 11 was left as a stub area, SW1 would have an entry to 5.5.5.5 via R3.  R3 would then send the traffic directly to SW2.  Simple and deterministic.  Easy to troubleshoot and easy to follow the path.

But where this is now a totally stubby area, there is not an entry for 5.5.5.5 in SW1's routing table, the traffic is balanced to R1 and to R3.  R1 then load balances the traffic between R2 and R3.  Not very efficient, in my opinion.

However, totally stubby areas are great when there is a single entry and single exit point for an area.  If we were to convert Area 22 into a totally stubby area, we would not see an impact on SW2's routing table since it is an ABR, but we can reduce the size of SW2's OSPF database by making Area 22 a totally stubby area.

SW2's OSPF database as a normal area:


SW2's OSPF database as a totally stubby area:


Notice that the Summary Net Link States (Area 22) shrinks to a single entry of 0.0.0.0.

If there were another router downstream from SW2, the routing table would have the links within the area (such as 192.168.22.0 /24) and a default route to SW2.

Let's add a router to Area 22 and see how the OSPF database and IP routing table look from within a totally stubby area.

Here is the topology with R4 added via Dia:


R4's OSPF database:


Super small OSPF database.  Router Link States for the two routers (SW2 and R4), a Net Link State for the broadcast network of 192.168.22.0, and a Summary Net Link State with a default network out of the area.


And a super small routing table.  One entry for the connected network and a default route to leave the area.  If there was a loopback and a network statement to include that loopback, there would be three entries.

To summarize this post:

With multiple exit points from an area, a stub area might be a good idea to help load balance traffic out of the area.  But it also depends on how the backbone area is engineered.  A stub area will shrink the OSPF database if there are a lot of external OSPF networks.

Additionally, making the area a totally stubby area might cause traffic to become even more diffused.  This also depends on how the backbone area is engineered.  But for an area with only a single exit point, totally stubby areas can dramatically reduce the size of both the OSPF database and the IP routing table.

Please contact me if you have any questions on this.

Thank you for reading!

Friday, March 7, 2014

CCNP TSHOOT Takeaways and Strategy

Yesterday, I took and passed the CCNP TSHOOT exam.  I already passed the SWITCH and ROUTE exams, so I've completed the CCNP R&S Certification!

As far as the TSHOOT exam, I found it to be almost fun and not nearly as intimidating as the SWITCH and ROUTE exams.  However, I did run into a very frustrating item.

Lack of available commands.

I was thinking that all of the typical Cisco IOS commands would be available, but I was blindsided and had to regroup on the first two tickets.

All of the following commands are NOT available:

show interface status
show ip interface brief
show run interface ***
show run | section (eigrp | ospf | bgp)
traceroute (or 'tracert' from the clients)

Therefore, the only way to see a section of the running configuration is to do a 'show run'.  That's it.  You are forced to scroll up and down as needed.  On a few of the trouble tickets, I found it best to write a shorthand version of a section of the running config (like access list entries) on the laminated sheet and then scroll up to see how the access list entries are applied.  Not having the 'show run interface ***' command was the most frustrating.

Here was my strategy for studying for the CCNP TSHOOT exam:

1)  Build the topology on a home lab.  Cisco provides the entire topology and can be downloaded from Cisco's site:

https://learningnetwork.cisco.com/servlet/JiveServlet/previewBody/6741-102-1-23100/TSHOOT%20Exam%20Topology.pdf

I was one router short from my CCNP ROUTE lab, so I acquired a fourth 1721 to finish it off.  Then, I went through the .pdf and set up the topology exactly.  There are a few items that are not given on the map: the loopbacks and the OSPF router-ids on the routers.  In fact, there is a second loopback on R4 (and it does tie into one of the tickets).  I built and rebuilt the topology three times.

2)  Print out a copy of each running config and spend some time memorizing it.  It won't be identical to what you will see on the exam, but it will help you to spot differences.

3)  Break a section of the topology and hook up a computer to the ASW1 and start troubleshooting from the computer first.  For example, break the trunk configuration between ASW1 and DSW1.  Then perform an 'ipconfig /release' and 'ipconfig /renew' and observe how the computer responds (no IP address since the DHCP server is located at R4).  Break one of the links between R3 and R4, break OSPF somewhere, break EIGRP, etc. and then repair the problem.

4)  Read the following books:

  CCNP TSHOOT Official Certification Guide (I went through this great book three times and took extensive notes)
  CCNP TSHOOT Foundation Learning Guide
  CCNP TSHOOT Quick Reference

Since this exam covers topics from the SWITCH and ROUTE exams, I refreshed my memory with the notes I took for those exams.

5)  Practice what you want to write down on the laminated card.  I used this strategy after not passing my first attempt at the CCNA.  Having a plan on what you want to have notes on before the exam is crucial.

I found the following strategy to work really well for the exam itself:

1)  Set up your notes on the laminated card during the tutorial time.  Here is what I wrote on the card and then followed them in order on each ticket:

Client 1
  ipconfig
  ping

Switches
  show interface status (not available on the exam!  Use 'show ip interface ***' instead.)
  show interface trunk
  show vlan ('show vlan brief' is not available)
  show spanning-tree (by the way, 'show spanning-tree summary' is not available)

Routers (including DSW1 and DSW2)
  show ip interface ***
  show ip protocols
  show ip eigrp interfaces (only applicable on DSW1, DSW2 and R4)
  show ip eigrp neighbors (only applicable on DSW1, DSW2 and R4)
  show ip eigrp topology (only applicable on DSW1, DSW2 and R4)
  show ip ospf interfaces (only applicable on R1, R2, R3, and R4)
  show ip ospf neighbors (only applicable on R1, R2, R3, and R4)
  show ip ospf database (only applicable on R1, R2, R3, and R4)
  show ip bgp neighbors (only applicable on R1)
  show ip bgp (only applicable on R1)
  show ip route
 
  show run ('show run | section' and 'show run interface ***' are not available)

With those commands you will have a great place to start troubleshooting.  Be sure to follow the commands in that order.  If EIGRP or OSPF are not running on the interface, then there is no way a neighborship can form.  If the neighborship has not formed, then the routes cannot be added to the topology or database.  If the routes are not in the topology or database, then they cannot be added to the routing table ('show ip route').

2)  Start each ticket from Client 1 and issue the 'ipconfig' command.  If you have a 169.x.x.x address, start looking at the ASW1 configuration, then to DSW1 and further.  If it has a 10.2.1.3 address, start pinging each IP address in order.  I also wrote these IP addresses on the laminated card so I didn't have to jump back and forth to the maps.

  10.2.1.254 (default gateway per by HSRP configuration)
  10.2.1.1 (DSW1)
  10.2.1.2 (DSW2; probably not necessary to ping, but I did)
  10.1.4.6 (DSW1)
  10.1.4.5 (R4)
  10.1.1.10 (R4)
  10.1.1.9 (R3)
  10.1.1.6 (R3)
  10.1.1.5 (R2)
  10.1.1.2 (R2)
  10.1.1.1 (R1)
  209.65.200.225 (R1)
  209.65.200.224 (ISP router)
  209.65.200.241 ('web server' on most of the tickets)

All of these addresses are on the Layer 2/3 map given on the above .pdf.  Since 'traceroute' is not available, 'ping' is your best friend on this exam.  Where the ping stops responding is the best place to start.

I also brought up both devices on the same link to compare configurations.  For example, if the ping stopped at R3, I brought up the console window for R3 and R4.  I then ran through my 'routers' script ('show ip interface ***', 'show ip protocols', and so on) on both devices until I found the problem.

With all the reading, all the preparation on the SWITCH and ROUTE exams, being very familiar with the topology and my lab configurations, and having my 'scripts', I was able to not only pass the exam, I scored a perfect 1000!

If you have any questions or comments, please submit them below.

Thank you for reading!

On to the CCIE next!