No-NAT’s on HA ASA firewalls: How to break HA (Split Brain Active/Active)

Story Time! Last week I learned the hard way why you should not trust NAT conversion tools when dealing with No-NAT’s on an ASA. To fully understand this story lets back up a bit and Ill give you the high level details. I have been working on a project for the last month or so to replace some network equipment and redesign the network for a customer. The network was originally built 10 years ago, and in that time the site had expanded, new services were brought online, old services were decommissioned, integrations with third parties had come and gone, and yet no one bothered to clean up the network. Needless to say there was a lot of changes that needed to be made, and a ton of moving parts… lots to screw up.

On top of the clean up work, we were also changing the whole network design, replacing standalone switches and routers, with stacked switches and firewalls in HA pairs. I had spent the last month meticulously combing though each configuration to make sure what was configured in IOS was translated properly into ASA code. I wanted to make sure the network design changes accommodated every network requirement so that when I got onsite i felt confident that ripping out the old equipment and replacing it with the new didn’t break something I wasn’t aware of. For the most part my planning paid off and we had a smooth cutover with only minor issues that we quickly resolved. One issue stood out, however, on the one device I didn’t expect to have issues with.

At this site a pair of ASA 5510’s acted as an east/west firewall between different business units. We replaced them with a set of ASA 5516-X’s running ASA code. This should have been easy, copy the configuration from the old ASA over to the new ASA then head down to the pub in time for happy hour. But of course when I checked the version of the 5510’s I found it was still running version 8.2.

*audible gasp*

For those who don’t know, before ASA we had PIX and the old Cisco ASA’s (pre-8.3 version) borrowed the syntax from PIX. In order to copy the configuration over I needed to translate all of the NAT statements from the old syntax to the new syntax. Lucky for me, one of the business units didn’t want to follow the IP scheme, so on these ASA’s there were about 500+ 1:1 NAT translations and a handful of No-NAT translations. There was no way I was about to individually translate each NAT statement, so I took to the internet and used one of the many online ASA NAT translation tools. The tool worked like a charm and I was able to finish the configuration of the new ASA’s. Now lets cut to the day of cutover and talk about the issue.

The Issue

After ripping out the old network and cabling in the new network we turn all the gear on. We all know that feeling of everything coming to life, the fans blowing full blast, link lights lighting up. Everything visually looks like its coming up, so i start into the gear to verify we have connectivity. This is when I notice something odd, the 5516 ASA’s are both showing a green HA light. This means that both firewall’s think they are active, in whats known as an split brain active/active situation, which also means both firewalls are adverting duplicate IP’s.

I quickly log into the switch those firewalls plug into and shutdown the ports leading to the secondary ASA to stop the duplicate IP’s from propagating though the network. The failover and state interfaces are cabled directly to each other so I should be able to fix the split brain issue without introducing the duplicate IP’s.

I console into the ASA’s and check the failover state of each of them using the show failover command (NOTE: I changed the names and IP’s of the output):

primary

Failover On 
 Failover unit Primary
 Failover LAN Interface: FAILOVER GigabitEthernet1/7 (up)
 Version: Ours 9.8(4)29, Mate Unknown
 Serial Number: Ours REDACTED, Mate Unknown
         This host: Primary - Active 
                 Active time: 72881 (sec)
                 slot 1: ASA5516 hw/sw rev (3.4/9.8(4)29) status (Up Sys)
                   Interface CORP (10.1.1.1): Unknown (Waiting)
                   Interface BU1 (10.1.2.1): Unknown (Waiting)
                   Interface BU2 (10.1.3.1): Unknown (Waiting)
                   Interface BU3 (10.1.4.1): Unknown (Waiting)
                   Interface BU4 (192.168.1.1): Normal (Not-Monitored)
         Other host: Secondary - Not Detected 
                 Active time: 0 (sec)
                   Interface CORP (10.1.1.2): Unknown (Waiting)
                   Interface BU1 (10.1.2.2): Unknown (Waiting)
                   Interface BU2 (10.1.3.2): Unknown (Waiting)
                   Interface BU3 (10.1.4.2): Unknown (Waiting)
                   Interface BU4 (192.168.1.2): Unknown (Not-Monitored)

secondary

Failover On 
 Failover unit Secondary
 Failover LAN Interface: FAILOVER GigabitEthernet1/7 (up)
 Version: Ours 9.8(4)29, Mate Unknown
 Serial Number: Ours REDACTED, Mate Unknown
     This host: Secondary - Active 
         Active time: 72878 (sec)
         slot 1: ASA5516 hw/sw rev (3.4/9.8(4)29) status (Up Sys)
           Interface CORP (10.1.1.1): No Link (Waiting)
           Interface BU1 (10.1.2.1): No Link (Waiting)
           Interface BU2 (10.1.3.1): No Link (Waiting)
           Interface BU3 (10.1.4.1): No Link (Waiting)
           Interface BU4 (192.168.1.1): No Link (Not-Monitored)
     Other host: Primary - Not Detected 
         Active time: 0 (sec)
           Interface CORP (10.1.1.2): Unknown (Waiting)
           Interface BU1 (10.1.2.2): Unknown (Waiting)
           Interface BU2 (10.1.3.2): Unknown (Waiting)
           Interface BU3 (10.1.4.2): Unknown (Waiting)
           Interface BU4 (192.168.1.2): Unknown (Not-Monitored)

What the heck? When I configured these firewalls failover worked without an issue, the fact that the secondary has the full configuration is evidence that they detected each other at some point. I decided to try and ping from one failover IP to the next: NOTHING. OK, I can see link lights on the failover and state ports on each ASA, but lets try to change the cables connecting those ports together anyway with known good cables. Still no ping.

At this point I was getting frustrated. How could two devices literally attached to each other not be able to ping one another when I know they are in the same subnet. After a few more minutes of troubleshooting I finally decided to try to see if packet tracer would show me anything:

ASA#packet-tracer input FAILOVER icmp 10.0.0.1 0 0 10.0.0.2 detailed
...
Type: NAT
 Subtype: rpf-check
 Result: ALLOW
 Config:
 nat (CORP,any) source static OBJ-10.0.0.0_8 OBJ-10.0.0.0_8 destination static OBJ-10.0.0.0_8 OBJ-10.0.0.0_8 no-proxy-arp
 Additional Information:
         src ip/id=10.0.0.0, mask=255.0.0.0, port=0, tag=any
         dst ip/id=10.0.0.0, mask=255.0.0.0, port=0, tag=any, dscp=0x0
         input_ifc=CORP, output_ifc=FAILOVER
...
Result:
 input-interface: FAILOVER
 input-status: up
 input-line-status: up
 output-interface: CORP
 output-status: up
 output-line-status: up
 Action: allow

There it was! A No-NAT statement was matching the failover traffic and sending the response out the wrong interface. The culprit: The NAT translation tool I used online set any as the outbound interface in the No-NAT’s simply because the old 8.2 way of doing NO-NAT’s didn’t provide an outbound interface. Additionally, I used an IP address for failover within the range of the source and destination IP’s in the NO-NAT statement.

After I changed the No-NAT outbound interface to be more specific the ASA’s started talking to each other and the issue was resolved! I went back to the switch the secondary was plugged into and re-enabled the ports and everything returned to normal.

Lesson learned

There are two takeaways from this issue:

Never use any as the outbound zone on a No-NAT statement.
Try to use IP’s that are out of the range of any IP address for your failover and state links (APIPA anyone?)

Hopefully someone having this same issue reads my long-winded post and this solves their problem. If you are that person, hopefully it didn’t take you 3 hours to get it fixed!

The Issue

primary

secondary

Lesson learned

Share this:

Related

Leave a comment Cancel reply