Daemon
The ChaosPlane daemon runs as a DaemonSet — one pod per node. It handles chaos actions that require node-level access: network manipulation, stress testing, container kills, HTTP interception, DNS injection, and node restarts.
Why a daemon?
Some chaos actions can't be performed from the operator pod:
- Network chaos requires entering the target pod's network namespace and running
tcoriptablescommands - Stress chaos requires running
stress-nginside the pod's cgroup or on the host - Container kill requires access to the container runtime socket
- Node restart requires executing a system reboot on the host
The daemon runs with elevated privileges on each node and exposes a gRPC server that the operator calls.
gRPC API
The daemon exposes these RPC methods:
ExecStressChaos
Runs stress-ng for CPU, memory, I/O, or container-kill actions.
rpc ExecStressChaos(StressChaosRequest) returns (ChaosResponse);
message StressChaosRequest {
string experiment_id = 1;
string stressor_type = 2; // "cpu", "memory", "io", "container-kill"
map<string, string> parameters = 3;
}
ExecNetworkChaos
Applies network chaos using tc or iptables.
rpc ExecNetworkChaos(NetworkChaosRequest) returns (ChaosResponse);
message NetworkChaosRequest {
string experiment_id = 1;
string action = 2; // "delay", "loss", "corrupt", "duplicate", "partition", "bandwidth"
map<string, string> parameters = 3;
}
ExecHTTPChaos
Sets up HTTP interception for delay or abort.
rpc ExecHTTPChaos(HTTPChaosRequest) returns (ChaosResponse);
message HTTPChaosRequest {
string experiment_id = 1;
string action = 2; // "delay", "abort"
int32 port = 3;
map<string, string> parameters = 4;
}
ExecDNSChaos
Injects DNS errors.
rpc ExecDNSChaos(DNSChaosRequest) returns (ChaosResponse);
message DNSChaosRequest {
string experiment_id = 1;
string action = 2; // "error"
map<string, string> parameters = 3;
}
ExecNodeChaos
Executes node-level actions like restart.
rpc ExecNodeChaos(NodeChaosRequest) returns (ChaosResponse);
message NodeChaosRequest {
string experiment_id = 1;
string action = 2; // "restart"
map<string, string> parameters = 3;
}
CancelChaos
Cancels a running chaos action by execution ID.
rpc CancelChaos(CancelRequest) returns (CancelResponse);
message CancelRequest {
string execution_id = 1;
}
Network namespace access
For pod-scoped network chaos, the daemon needs to enter the pod's network namespace. It does this by:
- Looking up the pod's container ID from the
podNameandpodNamespaceparameters - Finding the container's network namespace path via the container runtime (e.g.
/proc/<pid>/ns/net) - Using
nsenteror Go'sunix.Setnsto enter the namespace - Running
tcoriptablescommands within that namespace
Execution tracking
The daemon assigns a unique execution_id to each chaos action and tracks running processes. When CancelChaos is called, it looks up the execution ID and terminates the associated process (stress-ng, tc, iptables rules, etc.).
Security
The daemon requires elevated privileges:
securityContext:
privileged: true
capabilities:
add:
- NET_ADMIN # tc, iptables
- SYS_ADMIN # nsenter, cgroup access
- SYS_PTRACE # process inspection
It also needs host PID namespace access (hostPID: true) to find container processes.
The daemon only accepts connections from the operator's service account. mTLS is used for gRPC communication.
Daemon endpoint resolution
The operator resolves the daemon endpoint for a pod using the pod's spec.nodeName:
func ResolveDaemonEndpoint(nodeName string) string {
return fmt.Sprintf("%s.chaosplane-daemon.chaosplane.svc.cluster.local:50051", nodeName)
}
Each daemon pod is addressable by its node name via a headless service.
Health
The daemon exposes:
- gRPC health check (
grpc.health.v1.Health) - HTTP
/healthzon port 8081
The DaemonSet uses these for liveness and readiness probes.