Skip to content

Finally! A Novel Solution for STP in Arista VXLAN Deployments

Overview

Recently, I had the opportunity to help review a customer's Arista  HER-VXLAN deployment after a spanning-tree loop affected their campus deployment. The customer was able to quickly identify the issue and put a fix in place. The question back to our team at iVi was simply, "How can we prevent this from happening again?"

The conventional response may have been to just acknowledge that the issue was caused by lack of STP controls deployed at the end-user facing ports, but that is a bit of deflection to the underlying issue: Spanning-Tree doesn't function in the underlay of VXLAN enabled networks . 

While looking for alternatives, I found the command spanning-tree root super in Arista's EVPN multi-homing guide. The document references this feature as a way for the fabric to behave as a single Spanning-Tree bridge to downstream Layer 2 switches in an EVPN All-Active deployment - but how does it actually do this?

Arista Networks published a TOI document for this functionality: Spanning-Tree Root Super. This document mentions being supported in EVPN-VXLAN and EVPN-MPLS deployments, but does not mention that as a requirement. That makes sense because the functionality of the command only changes the Layer 2 behavior of the device.

Shouldn't the functionality work in even the most basic HER VXLAN deployments? Time for some testing!

Lab Scenario

We will be using a simplified Arista HER-VXLAN (Head-End Replication) implementation for this scenario.  Since Spanning-Tree does not function on the Layer 3 network that makes up the Spine/Leaf connectivity, this has been simplified with the 'HER VXLAN Cloud' in my diagram.

This is the simplest form of VXLAN implementation allowing us to highlight the key issue.

In this example, the legacy device was connected to vEOS01 and vEOS02 to access ports in VLAN 10; no other port configurations are in place. In this vanilla configuration, both interfaces of the legacy device have participated in STP. As STP BPDUs are not forwarded across the HER VXLAN Underlay, no loop is detected. All interfaces are in a forwarding state.

Very quickly, we will start to see messages like these in the logs of the fabric switches:

%ETH-4-HOST_FLAPPING: Host XX.00A1  in VLAN 10 is flapping between interface Vlan10 and interface Eth1

In an Arista EVPN deployment, this would result in the MAC being blacklisted. The network is protected, but the workload would be offline. So How Do We Prevent This?

The Solution: Spanning-Tree Root Super

Arista TOI: Spanning-Tree Root Super

When the command spanning-tree root super is used, EOS does a couple of things:

  1. The Spanning-Tree Priority is set to zero
  2. The Bridge ID is set to `0000.0000.0001’ This is key!

Here's the result of the changes:

The root bridge is now elected as 0000.0000.0001. The Legacy device sees that both Eth1 and Eth2 are connected to the root bridge and elects to shut down Eth2. With just this single command, STP actually prevented an outage.

Further Improvements

The Arista documents referenced in this post are worth the read. There's one section in the TOI that's worth calling out:

Limitations:

      • The user needs to configure and ensure identical STP configuration on all switches configured as STP super root.
      • All the port-channels across multihoming VTEPs should have the same port-channel numbers.
      • There are no Layer-2 links between devices with super root configuration.
      • Super Root Bridge-ID 0000.0000.0001 can't be changed.

Before Spanning-Tree Root Super

Connecting VXLAN Leaves with unique VTEP IPs directly to one another is a BAD IDEA. As this is an exercise on protecting fabric from mistakes, it's a great example of this tech in action.

In the scenario below, vEOS01 would be elected root bridge - both devices would flood MACs across Eth3 as well as in VXLAN.

There's one additional command that  negates this limitation: spanning-tree guard loop default

Spanning-Tree Root Super with Loop Guard

Going back to the original lab running spanning-tree root super, the Legacy device is connected with Eth2 in a blocking state, but this time vEOS01 and vEOS02 are running spanning-tree guard loop default.  

What happens when a connection is now made between Eth3 on both devices?

The following messages are logged in the console:

vEOS1 Stp: %SPANTREE-4-INTERFACE_SELF_LOOPED: Interface Eth3 received its own bpdu: blocking interface (bridge mac 00:00:00:00:00:01 port id 3 Vl10)

vEOS1 Stp: %SPANTREE-6-INTERFACE_STATE: Interface Eth3 instance Vl10 moving from forwarding to discarding

Since both vEOS01 and vEOS02 have the same bridge IDs thanks to spanning-tree root super, STP now sees its own BPDU on Eth3.

With the addition of the spanning-tree guard loop default command, vEOS1 now reacts by placing Eth3 in a discarding state. Another infrastructure impact prevented by Spanning-Tree!

Am I Really Making a Case for STP?

Surprisingly - YES.

In a tightly controlled data center environment, with a close scrutiny of every minor change - some organizations (and vendors) have chosen to disable Spanning-Tree in these deployments.

As we see the adoption of VXLAN-tech make its way into campus deployments, we must also acknowledge how these networks are operated. Instead of high SLAs for critical data center-hosted services, prioritization of servicing the end users is key (Just behind security, right?!)

In the end, mistakes happen. This could run the gamut from a well-meaning network tech misreading a switch/cable label, to someone grabbing an old switch from the backroom to react to an unplanned event. Our goals as network architects are to build as resilient an infrastructure as possible.

As companies look to adopt best-in-breed vendors/technologies, a level of interoperability between the legacy devices is still needed. A well-designed and implemented STP strategy is still valid.

Why is This Solution Different?

Other vendors have solutions for dealing with Layer 2 loops in VXLAN deployments; these solutions vary by vendor/operating system and rely on detection after the loop is in place.

The implementation of each of these loop detection methods is slightly different, but relies on the network fabric detecting traffic it has generated being received on another fabric port. When this happens, the fabric's port is put into an errored state to prevent the loop.

The following is a screenshot of my EVE-NG lab used for validation testing:

  • vEOS 1/2 and vEOS 4/5 are in MLAG pairs, each pair with their own VTEP
  • Limited by my lab capacity, vEOS2 and vEOS4 provide underlay connectivity
  • Only vEOS 1/2 and vEOS 4/5 are running Spanning-Tree Root Super
  • vEOS3 and vEOS6 are connected to the ToR with a Port Channel

Notable Differences in This Approach:

  • All ToR VXLAN switch ports are in a forwarding state
  • vEOS3 and vEOS6 acting as simple Layer 2 switches have all north bound ports in a forwarding state
  • vEOS3 detected the root bridge through vEOS6, STP acted to put Eth4 in a discarding state
  • vEOS7 sees the root bridge through Eth 1/2, deciding to leverage Eth2 to reach the root bridge
  • No additional proprietary protocols relying on detection / mitigation
  • Feature Consistency - it works on ALL Arista network operating systems; there's just the one.