Why tcp-mss-clamp still matters

This is blogging in anger after fighting this over the weekend. Because I'm like that I have a backup cable ISP in case my primary fiber ISP flakes out. I work from home, so the existence of internet is critical to me getting paid, and neither cell phone has good enough service to hotspot reliably. Thus, having two ISPs. It's expensive, but then so would be missing work for a week while I wait for a cable tech to come out to diagnose why their stuff isn't working.

The backup ISP hasn't been working well for a while, but the network card pointing to the second cable modem flaked out two weeks ago and that meant replacement. Which refused to pick up address info (v4 or v6) off of DHCP. Doing a hard reset from the provider side fixed the issue, but left me with the curious circumstance of:

  • I can curl from the router
  • But nothing behind it could curl.
  • Looking at the packet trace of the behind the router case saw the TCP handshake finish, but TLS handshake fail after the initial hello.

What the actual fuck.

What fixed the problem was the following policy added to my firewalld config in /etc/firewalld/policies/backuprouter.xml.

<rule>
  <tcp-mss-clamp value="1448"/>
</rule>

MSS means 'maximum segment size' which is a TCP thing indicating how much the TCP portion of the packet can occupy. For networks with a typical Maximum Transfer Unit (MTU) size of 1500, MSS is typically 1460. Networking over things like VPNs often trims the effective MTU due to VPN overhead, often to 1492 with a corresponding reduction in MSS to 1452. The tcp-mss-clamp setting is telling firewalld to lock MSS to 1448; so if something behind it requests higher, the router will rewrite (and reassemble) segments to conform to the MSS setting.

The tcp-mss-clamp setting can be set to 'pmtu' which will cause firewalld to probe what the effective MTU (and by proxy MSS) number should be so you don't have to hard-code. And yet, here I am, hard-coding because crossing my own router seems to require an extra 4 bytes. I don't know why, and that angers me. Packet traces from the router itself show MSS of 1452 working fine, but that provably doesn't work from behind my router.

Whatever. It works now, which is what matters, and now I'm contributing this nugget back to the internet.