I've just recently deployed IPv6. Everything went well except for one painful issue that is still not really resolved my satisfaction. To recount the story requires covering quite a bit of ground and digging through a pile of acronyms. Hold on tight!
My ISP provides a native
/48
network per customer. That means that I get a mere
1208925819614629174706176
public IP addresses spread over
65536
/64
networks to use as I please.
I want to use my existing FreeBSD router to do the routing for the individual networks. I want to do this for several reasons:
The ISP-provided box is a standard consumer router and is fairly limited in what it can do. It's not actively harmful; it's a respectable brand and fairly powerful hardware, but it's still only a consumer-grade box with a web interface.
I'd rather have the intricate configuration details of my network be stored in text configuration files on commodity hardware and on an operating system that I mostly trust. The ISP-provided box runs Linux on proprietary hardware and only provides shell access via an undocumented (authenticated) backdoor (side-door?).
I trust myself to write safe pf rules.
Exposing the FreeBSD machine directly to the WAN eliminates one routing hop.
However, in order to allow my FreeBSD machine to do the routing of the individual networks (as opposed to letting the entirely optional ISP-provided box do it), I had to get it to handle the PPP connection. The machine doesn't have a modem, so instead I have to run the ISP-provided modem/router in bridging mode and get the FreeBSD machine to send PPP commands using the PPPoE protocol. Encouragingly, my ISP suggested that yes, I should be using FreeBSD for this. It's a testament to the quality of IDNet: They are a serious technical ISP, they don't treat their customers like idiots, and they respect the freedom of choice of their customers to use whatever hardware and software they want.
For those that don't know, limitations in PPPoE
mean that the
MTU of the link is limited to at most
1492
. For reference, most networks on the internet are using an MTU
of 1500
. In IPv4
, if you send a packet that's larger than your
router's MTU, the packet will be fragmented into separate pieces
and then reassembled at the destination. This has, historically,
turned out to be a rather nasty way to deal with oversized packets
and therefore, in IPv6
, packets that are larger than the MTU
will be rejected by routers and will result in Packet Too Large
ICMPv6 messages being returned to the sender.
In effect, this means that IPv6 networks are somewhat less tolerant of misconfigured MTU values than IPv4 networks. Various companies have written extensively about fragmentation issues.
So why am I mentioning this? Well, shortly after I'd enabled IPv6 for
the network and all services, I suddenly ran into a problem where
I couldn't send mail. The symptom was that my mail client would
connect to the SMTP server, authenticate
successfully, send an initial DATA
command, and then sit there
doing nothing. Eventually, the server would kick the client due to
lack of activity. After asking on the mailing list for my mail
client, Andrej Kacian pointed me at a thread that documented
someone dealing with MTU issues. After some examination with
Wireshark, I realized that my workstation
was sending packets that were larger than the PPPoE
link's MTU
of 1492
. My FreeBSD machine was dilligently responding with
Packet Too Large
errors, but for whatever reason, my Linux workstation
was essentially ignoring them. Some conversations on the #ipv6
Freenode IRC channel have suggested that Linux
handles this very badly. Worse, it seems that the MTU related issues
are sporadic: Sometimes it works without issue, other times not.
The "solution" seems to be this: Set the MTUs of all interfaces on
all machines in my network to 1492
. If I, for example, set the MTU
of my workstation's network interface to 1500
and set the FreeBSD
router's interfaces to 1492
, I can no longer SSH reliably into
remote sites, and the majority of TLS handshakes fail. No Packet Too Large
errors are generated, which seems counter to my understanding
of how this stuff is supposed to work. I very much dislike having to
use a non-default MTU on my network: It seems like I will inevitably
forget to set it on one or more machines and will run into bizarre
and intermittent network issues on that machine.
Some further conversation on the #ipv6
IRC channel suggests that I
should not have to do this at all. However, I've so far spent roughly
ten hours trying to debug the problem and am exhausted. Using a
non-standard MTU in my LAN(s) works around the issue for now, and
I'll re-examine the problem after my capacity for suffering has
been replenished.
2018-02-23: Update: IPv6 And Linux
I wanted to start moving all my projects to Java 9, but quickly discovered that a lot of Maven plugins I depend on aren't ready for Java 9 yet.
japicmp doesn't support Java 9 because javassist doesn't support Java 9 yet.
maven-bundle-plugin doesn't support Java 9 because BND doesn't support Java 9 yet.
Update (2017-10-03): John Poth has offered a workaround
maven-dependency-plugin doesn't support Java 9. See this ticket.
Considering moving to producing 100% reproducible builds for all of my packages.
It seems fairly easy. The following changes are required for the primogenitor:
Stop using maven.build.timestamp. The commit ID is enough!
Use the reproducible-build-maven-plugin
to strip manifest headers such as Built-By
, Build-JDK
, etc, and
repack jar files such that the timestamps of entries are set to known
constant values and the entries are placed into the jar in a
deterministic order.
Strip Bnd-LastModified
and Tool
headers from bundle manifests using
the <_removeheaders>
instruction in the maven-bundle-plugin
configuration.
Stop using version ranges. This may be too painful.
Some early experiments show that this yields byte-for-byte identical jar files on each compile. This is pretty impressive.
The one open issue: Oracle (or OpenJDK's) javac
appears to produce
completely deterministic output; there aren't any embedded timestamps
or other nonsense. However, someone building the packages from
source isn't guaranteed to be using an Oracle JDK. I could use the
Enforcer
plugin to check that the user is using a known-deterministic JDK,
but it would be pretty obnoxious to break builds if they aren't. Perhaps
a warning message ("JDK is not known to produce deterministic output: Build may not be reproducible!")
is enough.
I'm currently working on some code that implements a simple reliable delivery protocol on top of UDP. UDP is used because latency must be minimized as much as possible.
In order to test that the protocol works properly in bad network conditions, I need a way to simulate bad network conditions. For example, I'd like to see how the protocol implementation copes when 50% of packets are lost, or when packets arrive out of order.
The Linux kernel contains various subsystems related to networking, and I found that a combination of network namespaces and network emulation was sufficient to achieve this.
The netem
page states that you can use the tc
command to set
queuing disciplines on a network interface. For example, if your
computer's primary network interface is called eth0
, the following
command would add a queueing discipline that would cause the eth0
interface to start dropping 50% of all traffic sent:
# tc qdisc add dev eth0 root netem loss 50%
This is fine, but it does create a bit of a problem; I want to use my network interface for other things during development, and imposing an unconditional 50% packet loss for my main development machine would be painful. Additionally, if I'm running a client and server on the same machine, the kernel will route network traffic over the loopback interface rather than sending packets to the network interface. Forcing the loopback interface to have severe packet loss and/or corruption would without a doubt break a lot of software I'd be using during development. A lot of software communicates with itself by sending messages over the loopback interface, and disrupting those messages would almost certainly lead to breakages.
Instead, it'd be nice if I could create some sort of virtual network
interface, assign IP addresses to it, set various netem
options on
that interface, and then have my client and server programs use that
interface. This would leave my primary network interface (and loopback
interface) free of interference.
This turns out to be surprisingly easy to achieve using the Linux kernel's network namespaces feature.
First, it's necessary to create a new namespace. You can think of
a namespace as being a named container for network interfaces. Any
network interface placed into a namespace n
can only see interfaces
that are also in n
. Interfaces outside of n
cannot see the interfaces
inside n
. Additionally, each namespace is given its own private
loopback interface. For the sake of example, I'll call the new namespace
virtual_net0
. The namespace can be created with the following
command:
# ip netns add virtual_net0
The list of current network namespaces can be listed:
# ip netns show virtual_net0
Then, in order to configure interfaces inside the created namespace,
it's necessary to use the ip netns exec
command. The exec
command
takes a namespace n
and a command c
(with optional arguments) as arguments,
and executes c
inside the namespace n
. To see how this works, let's
examine the output of the ip link show
command when executed outside
of a namespace:
# ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000 link/ether f0:de:f1:7d:2a:02 brd ff:ff:ff:ff:ff:ff
You can see that it shows the lo
loopback interface, and my desktop
machine's primary network interface enp3s0
. If the same command is
executed inside the virtual_net0
namespace:
# ip netns exec virtual_net0 ip link show 1: lo: <LOOPBACK> mtu 65536 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
The only interface inside the virtual_net0
is lo
, and that lo
is not the same lo
from the previous list - remember that namespaces
get their own private lo
interface. One obvious indicator that this
is not the same lo
interface is that the lo
outside of the main
system is in the UP
state (in other words, active and ready to
send/receive traffic). This namespace-private lo
is DOWN
. In
order to do useful work, it has to be brought up:
# ip netns exec virtual_net0 ip link set dev lo up # ip netns exec virtual_net0 ip link show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
We can then create virtual "dummy" interfaces inside the namespace.
These look and behave (mostly) like real network interfaces. The
following commands create a dummy interface virtual0
inside the
virtual_net0
namespace, and assign it an IPv6
address fd38:73b9:8748:8f82::1/64
:
# ip netns exec virtual_net0 ip link add name virtual0 type dummy # ip netns exec virtual_net0 ip addr add fd38:73b9:8748:8f82::1/64 dev virtual0 # ip netns exec virtual_net0 ip link set dev virtual0 up # ip netns exec virtual_net0 ip addr show virtual0 2: virtual0: <BROADCAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether aa:5f:05:93:5c:1b brd ff:ff:ff:ff:ff:ff inet6 fd38:73b9:8748:8f82::1/64 scope global valid_lft forever preferred_lft forever
In my case, I also created a second virtual1
interface and assigned
it a different IPv6 address. It's then possible to, for example, run
a client and server program inside that network namespace:
# ip netns exec virtual_net0 ./server server: bound to [fd38:73b9:8748:8f82::1]:9999 # ip netns exec virtual_net0 ./client client: connected to [fd38:73b9:8748:8f82::1]:9999
The server
and client
programs will do all of their networking
in the virtual_net0
namespace and, because the Linux kernel knows
that the addresses of the network interfaces are both on the same
machine, the actual traffic sent between them will travel over the
virtual_net0
namespace's lo
interface.
A program like Wireshark can be executed
in the virtual_net0
namespace and used to observe the traffic
between the client and server by capturing packets on the lo
interface.
Now, we want to simulate packet loss, corruption, and reordering.
Well, unsurprisingly, the tc
command from netem
can be executed
in the virtual_net0
namespace, meaning that its effects are confined
to interfaces within that namespace. For example, to lose half of the
packets that are sent between the client and server:
# ip netns exec virtual_net0 tc qdisc add dev lo root netem loss 50%
Finally, all of the above can be cleaned up by simply deleting the namespace:
# ip netns del virtual_net0
This destroys all of the interfaces within the namespace.