→ RFC 5534: Failure Detection and Locator Pair Exploration Protocol for IPv6 Multihoming

This document specifies how the level 3 multihoming Shim6 protocol (Shim6) detects failures between two communicating nodes. It also specifies an exploration protocol for switching to another pair of interfaces and/or addresses between the same nodes if a failure occurs and an operational pair can be found.

Read the article - posted 2009-06-18

IPv6 and path MTU discovery black holes

A while ago, Geoff Huston wrote a column titled A Tale of Two Protocols: IPv4, IPv6, MTUs and Fragmentation.

Why didn't I read it earlier? Apart from normal busy-ness, I guess I suspected I wouldn't like this column. I was right. In case you don't have half an hour to read all the details, let me summerize the problem: at some point, Geoff couldn't load a page from the RFC Editor website with Safari but it worked with Firefox. Safari uses IPv6 by default (if the system has IPv6 connectivity). Firefox does this too, on any system other than a Mac. Under MacOS, however, Firefox won't use IPv6 by default. So the problem was with IPv6. Specifically, there was a path MTU discovery (PMTUD) black hole between Geoff and the RFC Editor. This is when small packets make it through, but large ones don't, and the requisite "packet too big" messages generated by routers that allow the sender to learn that it needs to send smaller don't make it back, so the sender keeps sending packets that are too large.

Apparently Geoff isn't aware of the fact that PMTUD black holes are extremely common on the IPv4 internet. For instance, try requesting a page from www.microsoft.com when going through a network link that as a maximum packet size (maximum transmission unit / MTU) of less than 1500 bytes somewhere in the middle of the path. It won't work. And Microsoft doesn't care: they keep filtering out "too big" messages and they keep setting the "don't fragment" bit to 1 on their packets. I can't understand why, because the combination of those two actions makes no sense. If they don't want to see the ICMP messages, they should turn off PMTUD so the DF bit is set to 0 and routers can fragment the packets that are too large. The reason that most users never see these problems is that those evil NAT boxes that we IPv6 proponents hate so much rewrite the maximum segment size (MSS) option in the TCP three-way handshake to indicate that the remote TCP should limit its packet sizes.

Anyway, back to IPv6.

Geoff suggests that the solution is to set your interface MTU to 1400. This reduced MTU and therefore MSS will be conveyed to remote servers in the TCP MSS option so it does help. However, other local systems still have the normal ethernet MTU of 1500 so they may try sending your system (non-TCP) packets larger than 1400 bytes, which won't work. The local router also won't send back a "too big" message because it thinks your system can handle 1500 bytes. This is going to be especially problematic once DNSSEC becomes common, because DNSSEC needs to transmit large pieces of data in DNS responses, which are usually UDP.

So what is the solution?

First of all, the good thing about IPv6 is that it is still being rolled out. So if you complain, the complaint is likely to arrive somewhere where IPv6 was recently enabled, and the problem may still be fixable, unlike in the case of www.microsoft.com for IPv4.

Originally, I wanted to suggest only changing the MTU for IPv6 for those of you who want to follow Geoff's advice. However, there doesn't appear to be a way to do that. But you may be able to have your local router send out an MTU option in its router advertisements that makes all the hosts use a reduced IPv6 MTU while leaving the IPv4 MTU alone. On a Cisco, simply do this:

!
interface ethernet0
 ipv6 mtu 1480
!

You need to do this on all the routers, though. And if you want to avoid all possible MTU issues, use an MTU of 1280, which is the minimum MTU that all IPv6 systems are required to support. Type ndp -i to see the effect on a Mac or FreeBSD system.

If the black hole happens when you send large packets, you can add MTU information for a destination to the routing table. For instance, this changes the MTU associated with the IPv6 default route on a MacOS system (FreeBSD without the "sudo") and then displays the default route:

sudo route change -inet6 default -mtu 1280
sudo route get -inet6 default

As for the real solution: RFC 4821 describes a solution (for TCP, at least) to the PMTUD black hole issue, so push your OS vendor to implement it.

Permalink - posted 2009-08-24