The Problem
One issue I ran into a few months ago while at the customer site was upgrading Cisco Nexus 5500 series switches that had duel homed Nexus 2000 series FEXs. For anyone who has looked into the process of upgrading Cisco’s Nexus series switches you would find that there are two ways to do it, In-Service Software Upgrade (ISSU) or the old Disruptive Upgrade.
For those who have tried upgrading a Nexus switch you should also know that the chance you will be able to upgrade using the preferred ISSU path is slim to none. Basically, the ISSU upgrade path keeps the data plane up and switching traffic while the supervisor upgrades. Sounds great on paper, but if your using your Nexus as a core or aggregation layer switch (which in my case is every Nexus switch Ive touched) ISSU will fail.
Why!? All because the control plane goes down. When Cisco says the data plane will stay up they are referring to the hardware ASICs continuing to packet switch. This means that if you have any advanced features enabled ISSU will fail. This includes routing, SVIs, any layer 3 modules installed; in fact if you have a control feature enabled but not configured ISSU will still fail.
Needless to say ISSU was not an option, we had to upgrade the old disruptive way. Now because the design is completely redundant with servers duel homed to each of the FEXs and the FEXs are duel homed to the two 5Ks we felt like the disruption would be minimal if at all. After reading up on Cisco’s upgrade guide the plan was as follows:
- Upgrade Primary 5K, causing failover to Secondary 5K
- Image from Primary 5K is uploaded and staged on FEXs
- Upgrade Secondary 5K, causing the FEXs to upgrade in a rolling fashion and join Primary 5K
Or so we thought… While the Cisco upgrade guide said that the FEXs will upgrade in a rolling fashion they did not. They all upgraded at once causing all of them to restart at once. This caused a major outage as both redundant links from the servers failed.
The Solution
Cut to Cisco Live 2016, I was fortunate enough to attend a session on the Nexus line hosted by the lead developers of the Nexus. In that session, they go over the architecture of the Nexus line and the last section was on the various upgrade procedures. The following is the procedure explained to us that would have saved me the headache I endured a month before. Note that this is the correct upgrade procedure for two single SUP Nexus devices with duel homed FEXs downstream:
- Upgrade the Primary Nexus. This will cause downstream traffic to flow though the secondary Nexus. After the Nexus device upgrades it will push and stage the new image on each of the FEXs.
- Before upgrading the Secondary Nexus, one at a time shutdown the links from the secondary Nexus to the FEX you want to upgrade. This allows for controlled rolling upgrades of the FEXs. Note: Make sure the FEX shows a status of online on the primary before shutting down the link to the next FEX.
- Once all FEXs are upgraded and flowing though the Primary, upgrade the Secondary Nexus.
- Perform any additional post upgrade tasks that may have come up during the upgrade.
You can imagine the sense of excitement then immediate disappointment that I did not think to shutdown the interfaces on the secondary Nexus.