Skip to main content

Network Chaos

Network chaos actions manipulate traffic at the pod's network interface using Linux tc (traffic control) and iptables. All 6 actions are implemented via the ChaosPlane daemon and support full rollback.

How it works

The daemon runs on each node and receives gRPC requests from the operator. For network chaos, it uses tc netem to inject latency, loss, corruption, and duplication, and iptables for partition and bandwidth limiting. All rules are scoped to the pod's network namespace and cleaned up on rollback.

network-delay

Adds artificial latency to all outbound packets from the targeted pods. Tests timeout handling, retry logic, and SLA compliance under degraded network conditions.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: delay-api-calls
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: frontend
action:
type: network-delay
parameters:
latency: "200ms"
jitter: "50ms"
correlation: "25"
duration: 60s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
latencystringYesBase latency to add (e.g. 100ms, 1s)
jitterstringNo"0ms"Random jitter range
correlationstringNo"0"Correlation between successive delays (0-100)

Rollback: Removes tc netem rules from the pod's network namespace.

network-loss

Randomly drops packets at the specified percentage. Tests how your app handles packet loss, retransmissions, and degraded throughput.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: packet-loss-test
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: streaming-service
action:
type: network-loss
parameters:
percent: "10"
correlation: "25"
duration: 60s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
percentstringYesPacket loss percentage (0-100)
correlationstringNo"0"Correlation between successive drops (0-100)

Rollback: Removes tc netem rules.

network-corrupt

Randomly corrupts packets by flipping bits. Tests checksum validation, error detection, and protocol resilience.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: packet-corruption
namespace: staging
spec:
target:
kind: Pod
namespace: staging
labelSelector:
matchLabels:
app: data-pipeline
action:
type: network-corrupt
parameters:
percent: "5"
duration: 30s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
percentstringYesPercentage of packets to corrupt

Rollback: Removes tc netem rules.

network-duplicate

Duplicates packets at the specified percentage. Tests deduplication logic, idempotency, and protocol handling of duplicate frames.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: packet-duplication
namespace: staging
spec:
target:
kind: Pod
namespace: staging
labelSelector:
matchLabels:
app: message-consumer
action:
type: network-duplicate
parameters:
percent: "20"
duration: 30s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
percentstringYesPercentage of packets to duplicate

Rollback: Removes tc netem rules.

network-partition

Blocks all traffic to/from specified CIDRs using iptables DROP rules. Simulates a network split or firewall misconfiguration.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: partition-from-db
namespace: production
spec:
target:
kind: Pod
namespace: production
labelSelector:
matchLabels:
app: api-server
action:
type: network-partition
parameters:
target_cidr: "10.0.1.0/24"
direction: "both"
duration: 30s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
target_cidrstringYesCIDR range to block
directionstringYesingress, egress, or both

Rollback: Removes iptables DROP rules.

network-bandwidth

Limits the available bandwidth using tc tbf (token bucket filter). Tests behavior under constrained network throughput.

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosExperiment
metadata:
name: bandwidth-limit
namespace: staging
spec:
target:
kind: Pod
namespace: staging
labelSelector:
matchLabels:
app: video-encoder
action:
type: network-bandwidth
parameters:
rate: "1mbit"
burst: "10kb"
latency: "100ms"
duration: 60s
rollback:
enabled: true

Parameters:

ParameterTypeRequiredDefaultDescription
ratestringYesBandwidth limit (e.g. 1mbit, 500kbit)
burststringYesBurst size (e.g. 10kb, 1mb)
latencystringYesMaximum latency before packets are dropped

Rollback: Removes tc tbf rules.

Combining network faults

You can combine multiple network experiments in a workflow to simulate realistic degraded conditions:

apiVersion: chaos.chaosplane.io/v1alpha1
kind: ChaosWorkflow
metadata:
name: network-degradation-suite
spec:
templates:
- name: add-latency
type: experiment
experimentRef:
name: delay-api-calls
namespace: production
- name: wait
type: delay
delay:
duration: 30s
dependencies: [add-latency]
- name: add-loss
type: experiment
experimentRef:
name: packet-loss-test
namespace: production
dependencies: [wait]
errorHandling:
strategy: rollback

Steady-state recommendations

For network chaos, use HTTP or Prometheus probes to verify your service is still responding:

steadyState:
before:
- name: api-healthy
type: http
http:
url: http://api-service.production.svc.cluster.local/health
expectedStatus: 200
after:
- name: api-recovered
type: http
http:
url: http://api-service.production.svc.cluster.local/health
expectedStatus: 200
recoveryTimeout: 2m