abeggi + networking   2

FortiGate firewall clusters group-id
A newly installed FortiGate cluster (a simple two node HA active-passive setup) and some packet loss issues…
Ping from the LAN side to the Internet (or from the firewall itself) resulted in about 20% packet loss, while the other way around (WAN to firewall’s main public IP) didn’t work at all.

I used the following command to check my MAC addresses:

FORTIGATE-PRI # diagnose hardware deviceinfo nic wan1
[..]
Current_HWaddr                  00:09:0f:09:00:08
Permanent_HWaddr                00:09:0f:d1:be:ef
[..]
Then resorted to the “show mac” switches facilites (some Cisco, some ProCurve) to know on which network ports that particular MAC lied… Only to discover that the cluster’s “logical” MAC address (00:09:0f:09:00:08) wasn’t really located where I expected it to be.
Well, FortiGate’s MAC addresses aren’t randomly generated. They have predictable values that depend on the firewall’s port number. The eight port (or wan1, in my case) will always have a virtual MAC as the one above. What will happen if you have two clusters (as we had) sitting on the same L2 network segment (on the same broadcast domain, that is)? You said MAC address conflict? You’re right.
The solution is simple, use the group-id directive to tweak the logical MAC address, i.e.:

config system ha
    set group-id 10
end
Changes the second right-most bytes of the MAC, from 00 to 0a:

before  00:09:0f:09:00:08
after   00:09:0f:09:0a:08
Point is that the “FortiOS High Availablity Handbook” explains the case very thoroughly! See page 192, paragraph “Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain”. We’re so used to discardable product documentation that sometimes we don’t even try to look for clues where they should normally reside.
Instead of troubleshooting, this time, I should really have Read The (unexpectedly) Fine Manual…
IT  FortiGate  HA  High_Availability  Networking  from google
july 2010 by abeggi
Who ate all the bandwidth?
Today Internet browsing is particularly slow.
At seemingly random intervals, available bandwith drops down and people get more and more irritable.

How do you find out why this is happening?

The possible causes boil down to:

Router/Firewall1 is not pleased by “something”. Could be an attack or a bug in the device firmware.
Too many connections. Maybe they’re not passing much traffic, but the internet gateway can’t keep up with their number. I’ve seen firewalls perform very badly in this respect. E.g.: 3 connections trying to download/upload as fast as they can, and a total, aggregate, b/w of 10Mbps. Those 3 plus 3000 “normal” connections and a total b/w of 6Mbps.
A reasonable amount of connections, effectively eating all of the available bandwidth.

I’ll skip case A, for now.
In case B you’ll likely want to know the firewall’s idea of “netstat”, meaning the complete listing of TCP/UDP/other connections. No big deal if the device has got some sort of CLI access: capture its output, import it into a spreadsheet, or use awk/sort/grep2 to build your stats. Usually, computing total number of connections by source IP address and sorting accordingly, is enough to gain some insight about what’s going on.
Case C… For long-running (days) data analysis, you could use a tool like NTOP. But if, like me today, you need to act quickly (perhaps because you know that the issue will disappear soon), iftop can hardly be beaten.
Both tools require the machine they run on to be able to “sniff” all the traffic passing through the firewall. This can be accomplished by configuring monitoring/monitored port(s) on a switch. Monitored ports get their inbound/outbound traffic copied to the monitoring one. Different vendors call the thing a different way, port mirroring is also a good keyphrase. Here are a couple of resources:

(Old) 3Com Superstack: Monitor Port on 3Com 4400
HP ProCurve, pretty straightforward to set up using the “menu” interface: How do I attach a LAN Analyzer to a Switch 208t/224t port to monitor LAN traffic for diagnostic purposes?
ProCurve switches are not limited to mirroring ports that belong to the same device/chassis: How to configure remote and intelligent mirroring on ProCurve switches

Low-end HP switches (like the ProCurve 1800 one I encountered here), though, are only manageable via a web gui:
Port Mirroring on a ProCurve 1800

Cisco: Port Mirroring, Configuring a Cisco Catalyst Switch SPAN mirroring port

(You could as well use a hub instead of a switch and get implicit mirroring of any port, to any port of the hub. Just unplug the firewall, link the hub to the switch, plug firewall and monitoring host in the hub. Kludgy but quick and easy, if you can afford the temporary cabling changes, and the bottleneck introduced by the hub…)

So:

Find the switch where the firewall is connected to. Which side of the firewall? It depends on where you believe the issues originates from. Let’s say the culprit is most likely to lie on the LAN → switch port A.
Connect your laptop/monitoring machine to the same switch → port B.
Set up monitoring: port A is monitored, port B is monitoring.
Run iftop, maybe telling it to also show port numbers (“-P”, without this switch, you’ll only see totals by source/destination IP addresses couple), don’t display hostnames “-n”, the interface “-i eth0″ and provide a meaningful filter (here I’m selecting packets whose source is not on the LAN3. The “-p” option instructs iftop to capture packets in promiscuous mode. Without it, iftop won’t lift off the wire packets that aren’t addressed to the machine on which it is running.
iftop -p -P -n -i eth0 -f 'not src net 192.168.200.0/23'
Iftop will produce a realtime table of running connections, sorted by how demanding they are in terms of bandwidth (10s average, by default). See the screenshot below; the top connections are due to two running video conference streams stealing 1Mbit/second worth of bandwidth, each.
iftop's output

Once everything is set up and you’re able to read iftop’s output, spotting the “top talkers” of your net becomes kids play, enjoy!

for brevity, I’ll just say “firewall” from now on. ↩
Yuri is king at doing that. See his AWK weekly series. ↩
iftop will still show these source addresses, since its output is always made of bidirectional “connections”. Only, counters pertaining to the LAN → outside direction, won’t increase. ↩
IT  Networking  Performance_Monitoring  Switching  Troubleshooting  from google
march 2010 by abeggi

Copy this bookmark:



description:


tags: