Skip to main content

node-restart

Triggers a node restart via the daemon. Tests cluster recovery from unexpected node failures.

Target kind: Node
Implementation: Daemon (ExecNodeChaos with action: restart)
Rollback: Waits for node to return to Ready state

Parameters

NameTypeRequiredDefaultDescription
grace_periodstringNo"0s"Grace period before reboot

Example

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: node-restart-example
namespace: chaosplane
spec:
target:
kind: Node
names:
- worker-node-2
action:
type: node-restart
parameters:
grace_period: "30s"
duration: 10m
rollback:
enabled: true
timeout: 10m

Rollback behavior

Polls the node's Ready condition every 10 seconds for up to 5 minutes. Once the node returns to Ready, rollback is considered complete. The experiment's rollback.timeout should be set longer than the expected reboot time.

Implementation notes

The daemon executes a system reboot on the host. All pods on the node are terminated immediately. The Kubernetes node controller will mark the node NotReady and begin evicting pods after the node unreachable timeout (default 5 minutes).

danger

This is a destructive action. Only use it in clusters where you can afford a node going offline. Ensure your workloads have enough replicas on other nodes.