Kubelet Mirror Pod Behaviours

Introduction

Kubernetes, and specifically the kubelet, let you load pod specifications from a directory on disk. Improper use of this functionality can lead to some strange things happening. This post will explore some of those edge cases.

In a typical Kubernetes cluster deployment, the Kubelet on each node reaches out to the API Server to find out what pods should be running on its node. This is core to maintaining the correct cluster status in the event of a node outage. However, some deployments of Kubernetes run the API Server and other control plane components as pods themselves. To allow this bootstrapping when the Kubelet has no API Server to communicate with, the Kubelet can be configured with a location from which to read static pod configurations.

This location can be configured as either a web location or a local directory, with the latter being the most common. In a kubeadm setup, the default directory is /etc/kubernetes/manifests. Inspecting this directory on a clean KinD cluster shows a number of manifests, generated to allow essential control plane components to start.

1
2
3
4
5
6
7
8


root@kind-control-plane:/etc/kubernetes/manifests# ls -la
total 28
drwxr-xr-x 1 root root 4096 Oct 11 14:05 .
drwxr-xr-x 1 root root 4096 Oct 11 14:05 ..
-rw------- 1 root root 2406 Oct 11 14:05 etcd.yaml
-rw------- 1 root root 3896 Oct 11 14:05 kube-apiserver.yaml
-rw------- 1 root root 3428 Oct 11 14:05 kube-controller-manager.yaml
-rw------- 1 root root 1463 Oct 11 14:05 kube-scheduler.yaml

These static pods correspond to objects we can observe through the Kubernetes API Server. These are mirror pods, which are a specific instance of a pod type created to track statically created pods in the API Server. Each pod is named with the value specified in the yaml file, suffixed with the name of the node the pod was created on. Note 4 pods in the output below which end with kind-control-plane and correspond to the manifests above.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


❯ kubectl get pods -n kube-system
NAME                                         READY   STATUS    RESTARTS   AGE
coredns-6f6b679f8f-9r7j9                     1/1     Running   0          27s
coredns-6f6b679f8f-qf4vz                     1/1     Running   0          27s
etcd-kind-control-plane                      1/1     Running   0          33s
kindnet-v5gcj                                1/1     Running   0          27s
kube-apiserver-kind-control-plane            1/1     Running   0          33s
kube-controller-manager-kind-control-plane   1/1     Running   0          32s
kube-proxy-ltn69                             1/1     Running   0          27s
kube-scheduler-kind-control-plane            1/1     Running   0          32s

The Kubernetes documentation on static pods specifies a good amount of the behaviour and the constraints which are placed on static pods. Some of these have significant security benefits, like not allowing static pods to use configmaps, secrets, or service accounts. It is possible to mount host volumes in to the container, potentially allowing a user to escalate permissions on the node if they have write access to the directory¹.

This seems like some fairly complex functionality, and as with anything complicated, there are edge cases which lead to odd behaviours. We’ll dive into one of these below.

Admission Control

In a modern cluster, the kubelet is always permitted to create a pod through Node Authorization. Created pods aren’t restricted to specific namespaces and, until recently, I’d never tried to create a pod in a namespace restricted by Pod Security Admission.

If you’re reading this, take a second to think about what will happen. I posed the question “Do the kubelet’s static pods get affected by admission control” to some colleagues, and the response was heavily weighted towards “No”. This makes sense, and was my initial suspicion too. The whole point of a static manifest is that it allows the kubelet to start workloads without a control plane to communicate with, so why should admission control get involved when that is a component which runs in the control plane itself?

So let’s try it out by creating two static pods. One is a standard pod with an empty security context, and one sets the privileged flag.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


❯ cat pod-standard.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: standard-pod
  name: standard-pod
spec:
  containers:
  - image: nginx
    name: standard-pod
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

❯ cat pod-privileged.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: privileged-pod
  name: privileged-pod
spec:
  containers:
  - image: nginx
    name: privileged-pod
    resources: {}
    securityContext:
      privileged: true
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

We can then check if the pods are running:

1
2
3


❯ kubectl get pods -n baseline -w
NAME                              READY   STATUS    RESTARTS      AGE
standard-pod-kind-control-plane   1/1     Running   0             14s

Only one pod is running, so that answers that question, right? The privileged pod wasn’t created. Let’s check the kubelet’s logs to be sure.

1
2
3
4
5
6
7


root@kind-control-plane:/# journalctl -u kubelet | grep standard-pod
Oct 12 13:38:16 kind-control-plane kubelet[693]: I1012 13:38:16.210457     693 pod_startup_latency_tracker.go:104] "Observed pod startup duration" pod="baseline/standard-pod-kind-control-plane" podStartSLOduration=18.210431419 podStartE2EDuration="18.210431419s" podCreationTimestamp="2024-10-12 13:37:58 +0000 UTC" firstStartedPulling="0001-01-01 00:00:00 +0000 UTC" lastFinishedPulling="0001-01-01 00:00:00 +0000 UTC" observedRunningTime="2024-10-12 13:38:16.210392461 +0000 UTC m=+55.347144485" watchObservedRunningTime="2024-10-12 13:38:16.210431419 +0000 UTC m=+55.347183443"

root@kind-control-plane:/# journalctl -u kubelet | grep privileged-pod
Oct 12 13:38:00 kind-control-plane kubelet[693]: E1012 13:38:00.645679     693 kubelet.go:1915] "Failed creating a mirror pod for" err="pods \"privileged-pod-kind-control-plane\" is forbidden: violates PodSecurity \"baseline:latest\": privileged (container \"privileged-pod\" must not set securityContext.privileged=true)" pod="baseline/privileged-pod-kind-control-plane"
Oct 12 13:38:17 kind-control-plane kubelet[693]: E1012 13:38:17.209797     693 kubelet.go:1915] "Failed creating a mirror pod for" err="pods \"privileged-pod-kind-control-plane\" is forbidden: violates PodSecurity \"baseline:latest\": privileged (container \"privileged-pod\" must not set securityContext.privileged=true)" pod="baseline/privileged-pod-kind-control-plane"
Oct 12 13:38:18 kind-control-plane kubelet[693]: E1012 13:38:18.217397     693 kubelet.go:1915] "Failed creating a mirror pod for" err="pods \"privileged-pod-kind-control-plane\" is forbidden: violates PodSecurity \"baseline:latest\": privileged (container \"privileged-pod\" must not set securityContext.privileged=true)" pod="baseline/privileged-pod-kind-control-plane"

This certainly looks like the container wasn’t created, as there was no pod startup, but something still doesn’t seem right. Let’s confirm there’s definitely no container running on the node:

1
2
3
4
5


root@kind-control-plane:/# crictl ps 
CONTAINER     IMAGE         CREATED            STATE   NAME           ATTEMPT POD ID        POD
cfb34cd8921c4 048e090385966 About a minute ago Running privileged-pod 0       0057552e19e0f privileged-pod-kind-control-plane
fc9e700421fe6 048e090385966 About a minute ago Running standard-pod   0       c145bd43367e8 standard-pod-kind-control-plane
<-- snip -->

Well, that’s interesting. The pod wasn’t able to register on the API Server, but the container itself is running. We can confirm this again by checking the pod ID rather than the container ID:

1
2
3
4


root@kind-control-plane:/etc/kubernetes/manifests# crictl pods 
POD ID        CREATED            STATE NAME                               NAMESPACE ATTEMPT RUNTIME
0057552e19e0f About a minute ago Ready privileged-pod-kind-control-plane  baseline  0       (default)
c145bd43367e8 About a minute ago Ready standard-pod-kind-control-plane    baseline  0       (default)

Yep, definitely running. We can confirm this another way by using the kubelet API (accessed, in this case, using the wonderful Kubeletctl):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14


root@kind-control-plane:/# ./kubeletctl_linux_arm64 --server 127.0.0.1 pods 
[*] Using KUBECONFIG environment variable
[*] You can ignore it by modifying the KUBECONFIG environment variable, file "~/.kube/config" or use the "-i" switch
┌─────────────────────────────────────────────────────────────────────┐
│                          Pods from Kubelet                          │
├────┬───────────────────────────────────┬───────────┬────────────────┤
│    │ POD                               │ NAMESPACE │ CONTAINERS     │
├────┼───────────────────────────────────┼───────────┼────────────────┤
<-- snip -->
│  6 │ privileged-pod-kind-control-plane │ baseline  │ privileged-pod │
├────┼───────────────────────────────────┼───────────┼────────────────┤
<-- snip -->
│ 11 │ standard-pod-kind-control-plane   │ baseline  │ standard-pod   │
<-- snip -->

Non-existent Namespaces

Okay, we’ve confirmed that running a pod that can’t get past admission control will run as a static node on the pod, but not as a mirror pod in the Kubernetes API Server. How about other attempts to create a local container that won’t be recognised by the API Server?

When we try to create a pod in a namespace that doesn’t exist, two errors are returned². One is from kubelet.go, and one is from event.go, both detailing that the API Server rejected the request for the pod to be created. However, as with the admission control example, we can see that the pod was successfully started.

1
2
3


root@kind-control-plane:/# crictl ps 
CONTAINER     IMAGE         CREATED         STATE   NAME        ATTEMPT POD ID         POD
78c94040ffd50 048e090385966 2 minutes ago   Running fake-ns-pod 0       89d585212ff8b fake-ns-pod-kind-control-plane

As with the previous example, we can execute commands inside the running container through Kubeletctl or by talking directly to the kubelet with curl, but not through kubectl exec.

Uses

It’s probably fair to say that creating pods which can’t be viewed by cluster administrators through kubectl is not a useful behaviour in day-to-day running of a cluster. I can’t think of any legitimate uses of this behaviour, but I do think it would be useful to someone trying to hide a running workload on a Kubernetes cluster they’ve compromised. Not that I’d talk about that or anything.

This shouldn’t ever result in a privilege escalation, as by default the directory can only be written by the root user, so the writing used already has full control of the system. ↩︎
Full errors text is in this repo, for those interested. ↩︎

Introduction#

Admission Control#

Non-existent Namespaces#

Uses#

Introduction

Admission Control

Non-existent Namespaces

Uses