IPv6 / Dual-Stack Support in kube-router
This document describes the current status, the plan ahead and general thoughts about IPv6 / Dual-Stack support in kube-router.
Dual-Stack (e.g. IPv4 and IPv6) has been supported in Kubernetes since version v1.21
:
IPv4/IPv6 dual-stack documentation
kube-router's current approach is to implement dual-stack functionality function-by-function:
- CNI
--enable-cni
- Proxy
--run-service-proxy
- Router
--run-router
- Network policies
--run-firewall
Current status (Oct 7, 2023)
Support for dual-stack in kube-router is feature complete. Release v2.0.0 and above of kube-router has all controllers updated for dual-stack compatibility.
Important Notes / Known Limitations / Etc
This represents a major release for kube-router and as such, user's should approach deploying this into an established kube-router environment carefully. While there aren't any huge bugs that the maintainers are aware of at this time, there are several small breaks in backwards compatibility. We'll try to detail these below as best we can.
How To Enable Dual-Stack Functionality
In order to enable dual-stack functionality please ensure the following:
- kube-router option
--enable-ipv4=true
is set (this is the default) - kube-router option
--enable-ipv6=true
is set - Your Kubernetes node has both IPv4 and IPv6 addresses on its physical interfaces
- Your Kubernetes node has both IPv4 and IPv6 addresses in its node spec:
$ kubectl describe node foo
...
Addresses:
InternalIP: 10.95.0.202
InternalIP: 2001:1f18:3d5:ed00:d61a:454f:b886:7000
Hostname: foo
...
- Add additional
--service-cluster-ip-range
and--service-external-ip-range
kube-router parameters for your IPv6 addresses. - If you use
--enable-cni=true
, ensurekube-controller-manager
has been started with both IPv4 and IPv6 cluster CIDRs (e.g.--cluster-cidr=10.242.0.0/16,2001:db8:42:1000::/56
) - Ensure
kube-controller-manager
&kube-apiserver
have been started with both IPv4 and IPv6 service cluster IP ranges (e.g.--service-cluster-ip-range=10.96.0.0/16,2001:db8:42:1::/112
)
Tunnel Name Changes (Potentially Breaking Change)
In order to facilitate both IPv4 and IPv6 tunnels, we had to change the hashing format for our current tunnel names. As such, if you do a kube-router upgrade in place (i.e. without reboot), it is very likely that kube-router will not clean up old tunnels.
This will only impact users that were utilizing the overlay feature of kube-router to some extent. Such as if you were
running kube-router with --enable-overlay
or --overlay-type=full
or --overlay-type=subnet
(it should be noted that
these options default to on currently).
If you are upgrading kube-router from a pre v2.0.0 release to a v2.0.0 release, we recommend that you coordinate your upgrade of kube-router with a rolling reboot of your Kubernetes fleet to clean up any tunnels that were left from previous versions of kube-router.
Differences in --override-nexthop
While v2.X and above versions of kube-router are IPv6 compatible and advertise both IPv4 and IPv6 addresses, it still
does this over a single BGP peering. This peering is made from what kube-router considers the node's primary IP address.
Which is typically the first internal IP address listed in the node's Kubernetes metadata (e.g. kubectl get node
)
unless it is overriden by a local-address annotation configuration.
This address with be either an IPv4 or IPv6 address and kube-router will use this to make the peering. Without
--override-nexthop
kube-router does the work to ensure that an IP or subnet is advertised by the matching IP family
for the IP or subnet. However, with --override-nexthop
enabled kube-router doesn't have control over what the next-hop
for the advertised route will be. Instead the next-hop will be overridden by the IP that is being used to peer with
kube-router.
This can cause trouble for many configurations and so it is not recommended to use --override-nexthop
in dual-stack
kube-router configurations.
One place where this was particularly problematic was when advertising the Pod IP subnets between different kube-router
enabled Kubernetes worker nodes. Workers that use overlay networking in a kube-router cluster are made aware of their
neighbors via BGP protocol advertisements and --override-nexthop
would mean that one family of addresses would never
work correctly. As such, we no longer apply the --override-nexthop
setting to pod subnet advertisements between
kube-router nodes. This is different functionality between version v1.X of kube-router and v2.x.
kube-router.io/node.bgp.customimportreject Can Only Contain IPs of a Single Family
Due to implementation restrictions with GoBGP, the annotation kube-router.io/node.bgp.customimportreject
, which allows
user's to add rules for rejecting specific routes sent to GoBGP, can only accept a single IP family (e.g. IPv4 or IPv6).
Attempting to add IPs of two different families will result in a GoBGP error when it attempts to import BGP policy from kube-router.
IPv6 & IPv4 Network Policy Ranges Will Only Work If That Family Has Been Enabled
Network Policy in Kubernetes allows users to specify IPBlock ranges for ingress and egress policies. These blocks are string-based network CIDRs and allow the user to specify any ranges that they wish in order to allow ingress or egress from network ranges that are not selectable using Kubernetes pod selectors.
Currently, kube-router is only able to work with CIDRs for IP families that it has been enabled for using the
--enable-ipv4=true
& --enable-ipv6=true
CLI flags. If a user adds a network policy for an IP family that kube-router
is not enabled for, you will see a warning in your kube-router logs and no firewall rule will be added.
kube-router.io/pod-cidr Deprecation
Now that kube-router has dual-stack capability, it doesn't make sense to have an annotation that can only represent
a single pod CIDR any longer. As such, with this release we are announcing the deprecation of the
kube-router.io/pod-cidr
annotation in favor of the new kube-router.io/pod-cidrs
annotation.
The new kube-router.io/pod-cidrs
annotation is a comma-separated list of CIDRs and can hold either IPv4 or IPv6 CIDRs
in string form.
It should be noted, that until kube-router.io/pod-cidr
is fully removed, at some point in the future, it will still
be preferred over the kube-router.io/pod-cidrs
annotation in order to preserve as much backwards compatibility as
possible. Until kube-router.io/pod-cidr
has been fully retired, users that use the old annotation will get a warning
in their kube-router logs saying that they should change to the new annotation.
The recommended action here, is that upon upgrade, you convert nodes from using the kube-router.io/pod-cidr
to the new
kube-router.io/pod-cidrs
annotation. Since kube-router currently only updates node annotations at start and not as
they are updated, this is a safe change to make before updating kube-router.
If neither annotation is specified, kube-router will use the PodCIDRs
field of the Kubernetes node spec which is
populated by the kube-controller-manager
as part of it's --allocate-node-cidrs
functionality. This should be a sane
default for most users of kube-router.
CNI Now Accepts Multiple Pod Ranges
Now that kube-router supports dual-stack, it also supports multiple ranges in the CNI file. While kube-router will
still add your pod CIDRs to your CNI configuration via node configuration like kube-router.io/pod-cidr
,
kube-router.io/pod-cidrs
, or .node.Spec.PodCIDRs
, you can also customize your own CNI to add additional ranges or
plugins.
A CNI configuration with multiple ranges will typically look something like the following:
{
"cniVersion": "0.3.0",
"name": "mynet",
"plugins": [
{
"bridge": "kube-bridge",
"ipam": {
"ranges": [
[
{
"subnet": "10.242.0.0/24"
}
],
[
{
"subnet": "2001:db8:42:1000::/64"
}
]
],
"type": "host-local"
},
"isDefaultGateway": true,
"mtu": 9001,
"name": "kubernetes",
"type": "bridge"
}
]
}
All kube-router's handling of the CNI file attempts to minimize disruption to any user made edits to the file.