Network Chaos
Network chaos actions manipulate traffic at the pod's network interface using Linux tc (traffic control) and iptables. All 6 actions are implemented via the ChaosPlane daemon and support full rollback.
How it works
The daemon runs on each node and receives gRPC requests from the operator. For network chaos, it uses tc netem to inject latency, loss, corruption, and duplication, and iptables for partition and bandwidth limiting. All rules are scoped to the pod's network namespace and cleaned up on rollback.
network-delay
Adds artificial latency to all outbound packets from the targeted pods. Tests timeout handling, retry logic, and SLA compliance under degraded network conditions.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: delay-api-calls
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: frontend
action:
type: network-delay
parameters:
latency: "200ms"
jitter: "50ms"
correlation: "25"
duration: 60s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
latency | string | Yes | — | Base latency to add (e.g. 100ms, 1s) |
jitter | string | No | "0ms" | Random jitter range |
correlation | string | No | "0" | Correlation between successive delays (0-100) |
Rollback: Removes tc netem rules from the pod's network namespace.
network-loss
Randomly drops packets at the specified percentage. Tests how your app handles packet loss, retransmissions, and degraded throughput.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: packet-loss-test
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: streaming-service
action:
type: network-loss
parameters:
percent: "10"
correlation: "25"
duration: 60s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
percent | string | Yes | — | Packet loss percentage (0-100) |
correlation | string | No | "0" | Correlation between successive drops (0-100) |
Rollback: Removes tc netem rules.
network-corrupt
Randomly corrupts packets by flipping bits. Tests checksum validation, error detection, and protocol resilience.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: packet-corruption
namespace: staging
spec:
target:
kind: Pod
namespace: staging
labelSelector:
matchLabels:
app: data-pipeline
action:
type: network-corrupt
parameters:
percent: "5"
duration: 30s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
percent | string | Yes | — | Percentage of packets to corrupt |
Rollback: Removes tc netem rules.
network-duplicate
Duplicates packets at the specified percentage. Tests deduplication logic, idempotency, and protocol handling of duplicate frames.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: packet-duplication
namespace: staging
spec:
target:
kind: Pod
namespace: staging
labelSelector:
matchLabels:
app: message-consumer
action:
type: network-duplicate
parameters:
percent: "20"
duration: 30s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
percent | string | Yes | — | Percentage of packets to duplicate |
Rollback: Removes tc netem rules.
network-partition
Blocks all traffic to/from specified CIDRs using iptables DROP rules. Simulates a network split or firewall misconfiguration.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: partition-from-db
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: api-server
action:
type: network-partition
parameters:
target_cidr: "10.0.1.0/24"
direction: "both"
duration: 30s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
target_cidr | string | Yes | — | CIDR range to block |
direction | string | Yes | — | ingress, egress, or both |
Rollback: Removes iptables DROP rules.
network-bandwidth
Limits the available bandwidth using tc tbf (token bucket filter). Tests behavior under constrained network throughput.
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: bandwidth-limit
namespace: staging
spec:
target:
kind: Pod
namespace: staging
labelSelector:
matchLabels:
app: video-encoder
action:
type: network-bandwidth
parameters:
rate: "1mbit"
burst: "10kb"
latency: "100ms"
duration: 60s
rollback:
enabled: true
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
rate | string | Yes | — | Bandwidth limit (e.g. 1mbit, 500kbit) |
burst | string | Yes | — | Burst size (e.g. 10kb, 1mb) |
latency | string | Yes | — | Maximum latency before packets are dropped |
Rollback: Removes tc tbf rules.
Combining network faults
You can combine multiple network experiments in a workflow to simulate realistic degraded conditions:
apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosWorkflow
metadata:
name: network-degradation-suite
spec:
templates:
- name: add-latency
type: experiment
experimentRef:
name: delay-api-calls
namespace: production
- name: wait
type: delay
delay:
duration: 30s
dependencies: [add-latency]
- name: add-loss
type: experiment
experimentRef:
name: packet-loss-test
namespace: production
dependencies: [wait]
errorHandling:
strategy: rollback
Steady-state recommendations
For network chaos, use HTTP or Prometheus probes to verify your service is still responding:
steadyState:
before:
- name: api-healthy
type: http
http:
url: http://api-service.production.svc.cluster.local/health
expectedStatus: 200
after:
- name: api-recovered
type: http
http:
url: http://api-service.production.svc.cluster.local/health
expectedStatus: 200
recoveryTimeout: 2m