Dual-stack home K8s cluster with Cilium

Introduction

Since a few weeks, I’m deploying a Kubernetes cluster at home.
The main goals are:

start hosting some stuff at home again
getting more proficient on Kubernetes
a certain idea of fun

Moreover, having IPv6 at home, I wanted my cluster to fully embrace it: be able to communicate using IPv6, and also expose services using that protocol.

This kind of setup is not difficult per se, but not as straightforward as a classical IPv4-only cluster.
There are still some rough edges, and documentation is sparse on some aspects of the subject. (stable DualStack support is still relatively new on K8s)

The cluster

I’m deploying on 3 nodes:

k8s-controller-0: will host the k8s control-plane
k8s-node-0
k8s-node-1

These nodes run on real hardware (some low-profile HP Elitebooks Mini).

Having only one node acting as a control-plane, this setup is obviously not highly-available, but I don’t care for home and lab purposes.

I used Debian 11 Bullseye (amd64) as the base OS.

The network

My existing setup

My home network is setup like this (these are not the real IPs):

My ISP (Orange, in France) provides:
- a dynamic /32 IPv4 address (changes rarely, but that happens)
- a static /56 IPv6 prefix (that I can fully address, since I run my own router): 2001:db8:7653:200::/56
My LAN uses:
- a /24 private IPv4 subnet, mixing statically and DHCP-addressed hosts: 192.168.16.0/24
- a /64 IPv6 subnet, mixing statically and SLAAC-addressed hosts: 2001:db8:7653:216::/64

Goals

I had these requirements for my K8s cluster:

pods should be able to communicate to the internet:
- using IPv4 and SNAT/masquerading.
- using IPv6 with their own uniques public IPs, on a dedicated but routed /64 (inside my main /56). I don’t want IPv6 NAT.
services should be able to be publicly exposed:
- via IPv4: using a static IP inside my 192.168.16.0/24 subnet
- via IPv6: using a static IP inside my 2001:db8:7653:216::/64 subnet
- both at the same time

(when I say “publicly”, understand it as “using an IP on my main LAN”: then I can configure forwarding/DNAT on my router if I need to open them on the internet.)

Let’s go.

K8s bootstrapping

I used Kubeadm, and won’t go into details about the process in itself, since it’s already widely documented everywhere.

This is the config I used:

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
  podSubnet: 10.244.0.0/16,2001:db8:7653:299:cafe:0::/96
  serviceSubnet: 10.96.0.0/16,2001:db8:7653:299:cafe:1::/112
controllerManager:
  extraArgs:
    node-cidr-mask-size-ipv4: "24"
    node-cidr-mask-size-ipv6: "112"

Some explanations:

podSubnet: the global subnets used by K8s to assign IPs to pods.
- 10.244.0.0/16: not routed, only reachable from inside the cluster. (K8s will do SNAT/DNAT for pods that need to reach the internet.)
- 2001:db8:7653:299:cafe:0::/96: routed, my home router allows it to reach the internet (but still blocks incoming requests).
serviceSubnet: the subnets used by K8s for services ClusterIPs.
- 10.96.0.0/16: not routed, only reachable from inside the cluster.
- 2001:db8:7653:299:cafe:1::/112: routed.
node-cidr-mask-size-*: the size of the subnets inside which all nodes will assign IPs to their respective pods.
- for IPv4, each node will get a unique /24 inside the 10.244.0.0/16 subnet
- for IPv6, each node will get a unique /112 inside the 2001:db8:7653:299:cafe:0::/96 subnet

It took some time, and I rebuilt the cluster many times before finding the right setup.

The IPv4 part is pretty standard, nothing special here.

But on the IPv6 side, K8s doesn’t like wide subnets, and wants a small difference between podSubnet mask size and node-cidr-mask-size.
So the best compromise I came to, was assigning pods a global /96, for K8s to cut it into /112 subnets, one for each node.

I also assigned a /112 for the serviceSubnet, in another /96. (K8s did not allow me to use the /96 directly for services either, was too wide for its taste) These two /96 are in the same /64, that is routed by my main router.

For clarity, here is the complete IPv6 hierarchy:

2001:db8:7653:200::/56: main /56 prefix assigned by my ISP
- 2001:db8:7653:216::/64: my main LAN subnet
- 2001:db8:7653:299::/64: another routed subnet dedicated to K8s internal use
  - 2001:db8:7653:299:cafe:0::/96: subnet for pods
  - 2001:db8:7653:299:cafe:1::/96
    - 2001:db8:7653:299:cafe:1::/112: subnet for services

CNI configuration

There are many CNIs for Kubernetes, and most of them support DualStack networking.

I chose Cilium mainly for its wide featureset and technical promises, and because I had not played with it before.

Deployment

I used the official Helm chart with the following flags:

$ helm install cilium cilium/cilium \
--namespace kube-system \
--set ipv4.enabled=true \
--set ipv6.enabled=true \
--set ipam.mode=cluster-pool \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.244.0.0/16" \
--set ipam.operator.clusterPoolIPv6PodCIDRList="2001:db8:7653:299:cafe:0::/96" \
--set ipam.operator.clusterPoolIPv4MaskSize=24 \
--set ipam.operator.clusterPoolIPv6MaskSize=112 \
--set bpf.masquerade=true \
--set enableIPv6Masquerade=false

Most flags simply duplicate the options I used at cluster init, for the others:

bpf.masquerade=true: for performance, since my kernel supports eBPF.
enableIPv6Masquerade=false: to disable IPv6 NATing since we use a routed subnet.

Post-configuration

Unfortunately, the Helm chart doesn’t support all the options I need, so I had to set some options after, in the cilium-config ConfigMap:

enable-ipv6-ndp: "true": this enables the NDP proxy feature at node level, to expose IPv6 pod IPs on the main LAN, as if they were directly plugged on it.
ipv6-mcast-device: "eno1": this is the network interface on nodes that will be used to proxy NDP packets. (note: of course, this setting requires that all of your nodes use the same name for their main interface. The next release of Cilium will provide autodetection for this.)
ipv6-service-range: "2001:db8:7653:299:cafe:1::/112": we specify the subnet we dedicate for ClusterIPs.

At this point, pods can be deployed with both ipv4 and ipv6. They can communicate between them, and also expose services using ClusterIPs or NodePorts. They are also able to reach the internet, using either protocol.

I then applied the additional steps to remove kube-proxy, and setup Cilium to handle this task using eBPF.

LoadBalancer

This is the last brick we need to finish the network setup of the cluster.

I really like MetalLB for its Layer-2 mode: in that mode, it implements the LoadBalancer object by allocating an IP in an address pool, and answering the ARP or NDP requests as if the IP was really bound, then directing traffic where it needs to go.
It’s not widely advertised nor documented, but it fully supports IPv6.

So, I installed it and provided the following configuration:

configInline:
  address-pools:
   - name: ipv4
     protocol: layer2
     addresses:
     - 192.168.16.100-192.168.16.119
   - name: ipv6
     protocol: layer2
     addresses:
     - 2001:db8:7653:216:ffff:fac:ade::/112

192.168.16.100-192.168.16.119: the address range where MetalLB will pick IPv4s, it’s a range from my main LAN that I dedicate to this usage.
2001:db8:7653:216:ffff:fac:ade::/112: an unused range in my main LAN IPv6 /64 subnet. MetalLB will pick IPv6 inside it for IPv6 or DualStack LoadBalancer services.

Note: MetalLB in L2 mode does not provide a real load-balancer, but rather uses the LoadBalancer abstraction to expose IPs outside of the cluster. I think this is very clever and it fits perfectly a home cluster.

Test all the things

Let’s check that everything works as intended.

Deploy a dual-stack pod

pod.yaml:

apiVersion: v1
kind: Pod
metadata:
  labels:
    app: nginx-test
  name: nginx-test
spec:
  containers:
  - image: nginx:latest
    name: nginx-test

$ kubectl apply -f pod.yaml
pod/nginx-test created

$  kubectl get pods nginx-test
NAME         READY   STATUS    RESTARTS   AGE
nginx-test   1/1     Running   0          3m6s

$ kubectl describe pod nginx-test
Name:         nginx-test
Namespace:    lab
Priority:     0
Node:         k8s-node-1/192.168.16.22
Start Time:   Wed, 01 Jun 2022 22:05:15 +0200
Labels:       <none>
Annotations:  <none>
Status:       Running
IP:           10.244.1.97
IPs:
  IP:  10.244.1.97
  IP:  2001:db8:7653:299:cafe:0:1:1095
[...]

We can see that the pod automatically gets both IPv4 and IPv6.
Now we can test that we’re able to reach the internet from inside the pod, in IPv4 and in IPv6:

$ kubectl exec -it nginx-test -- bash
root@nginx-test:/# curl -4 ifconfig.co
198.51.100.92
root@nginx-test:/# curl -6 ifconfig.co
2001:db8:7653:299:cafe:0:1:1095

Yep.

Expose a pod using a dual-stack ClusterIP

clusterip.yaml

apiVersion: v1
kind: Service
metadata:
  name: nginx-test
spec:
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx-test
  type: ClusterIP

$ kubectl apply -f clusterip.yaml
service/nginx-test created

$ kubectl describe service nginx-test
Name:              nginx-test
Namespace:         lab
Labels:            <none>
Annotations:       <none>
Selector:          app=nginx-test
Type:              ClusterIP
IP Family Policy:  RequireDualStack
IP Families:       IPv4,IPv6
IP:                10.96.56.97
IPs:               10.96.56.97,2001:db8:7653:299:cafe:1:0:b09d
Port:              <unset>  80/TCP
TargetPort:        80/TCP
Endpoints:         10.244.1.97:80
Session Affinity:  None
Events:            <none>

Our service got both IPv4 and IPv6, these IPs are only meant to be reachable from inside the cluster, let’s see that.

From inside another container:

$ kubectl exec -it toolbox-d4fbc949f-7qtpd -- bash
dek@toolbox-d4fbc949f-7qtpd:~$ curl -4 -I nginx-test.lab.svc.cluster.local
HTTP/1.1 200 OK
Server: nginx/1.21.6
Date: Wed, 01 Jun 2022 20:41:32 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 25 Jan 2022 15:03:52 GMT
Connection: keep-alive
ETag: "61f01158-267"
Accept-Ranges: bytes

dek@toolbox-d4fbc949f-7qtpd:~$ curl -6 -I nginx-test.lab.svc.cluster.local
HTTP/1.1 200 OK
Server: nginx/1.21.6
Date: Wed, 01 Jun 2022 20:41:43 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 25 Jan 2022 15:03:52 GMT
Connection: keep-alive
ETag: "61f01158-267"
Accept-Ranges: bytes

Our NGINX service is reachable by both protocols from inside the cluster.

Expose a pod using a dual-stack LoadBalancer

loadbalancer.yaml:

apiVersion: v1
kind: Service
metadata:
  name: nginx-test-lb
spec:
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    app: nginx-test
  type: LoadBalancer

$ kubectl apply -f loadbalancer.yaml
service/nginx-test-lb created

$ kubectl get service nginx-test-lb
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP                                       PORT(S)        AGE
nginx-test-lb   LoadBalancer   10.96.146.219   192.168.16.104,2001:db8:7653:216:ffff:fac:ade:0   80:32029/TCP   43s

We can see that following our LoadBalancer service, MetalLB allocated an IPv4 and an IPv6 directly on my private network.

So, I should be able to reach my nginx service from my laptop on that network:

$ curl -I 192.168.16.104:80
HTTP/1.1 200 OK
Server: nginx/1.21.6
Date: Wed, 01 Jun 2022 20:49:23 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 25 Jan 2022 15:03:52 GMT
Connection: keep-alive
ETag: "61f01158-267"
Accept-Ranges: bytes

$ curl -I [2001:db8:7653:216:ffff:fac:ade:0]:80
HTTP/1.1 200 OK
Server: nginx/1.21.6
Date: Wed, 01 Jun 2022 20:49:37 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Tue, 25 Jan 2022 15:03:52 GMT
Connection: keep-alive
ETag: "61f01158-267"
Accept-Ranges: bytes

It works !

Conclusion

I’m really happy with that setup, it checks all boxes and works really well.

There are many more things to add before enjoiying a full-featured cluster, so maybe more articles to come :)

Introduction#

The cluster#

The network#

My existing setup#

Goals#

K8s bootstrapping#

CNI configuration#

Deployment#

Post-configuration#

LoadBalancer#

Test all the things#

Deploy a dual-stack pod#

Expose a pod using a dual-stack ClusterIP#

Expose a pod using a dual-stack LoadBalancer#

Conclusion#