Chaos Engeering Concepts in the Chaos Toolkit

If you haven’t already, we strongly recommend reading the fantastic Chaos Engineering book from O’Reilly Media. This book will give you some fantastic background on the whole Chaos Engineering discipline, and it’s free!

Chaos Engineering is a discipline that allows you to surface weaknesses, and eventually build confidence, in complex and often distributed systems.

The Chaos Toolkit aims to give you the simplest experience for writing and running your own Chaos Engineering experiments. The main concepts are all expressed in an experiment definition, of which the following is an example from the Chaos Toolkit Samples project:

{
    "version": "1.0.0",
    "title": "System is resilient to provider's failures",
    "description": "Can our consumer survive gracefully a provider's failure?",
    "tags": [
        "service",
        "kubernetes",
        "spring"
    ],
    "steady-state-hypothesis": {
        "title": "Services are all available and healthy",
        "probes": [
            {
                "type": "probe",
                "name": "all-services-are-healthy",
                "tolerance": true,
                "provider": {
                    "type": "python",
                    "module": "chaosk8s.probes",
                    "func": "all_microservices_healthy"
                }
            }
        ]
    },
    "method": [
        {
            "type": "action",
            "name": "stop-provider-service",
            "provider": {
                "type": "python",
                "module": "chaosk8s.actions",
                "func": "kill_microservice",
                "arguments": {
                    "name": "my-provider-service"
                }
            },
            "pauses": {
                "after": 10
            }
        },
        {
            "ref": "all-services-are-healthy"
        },
        {
            "type": "probe",
            "name": "consumer-service-must-still-respond",
            "provider": {
                "type": "http",
                "url": "http://192.168.42.58:31018/invokeConsumedService"
            }
        }
    ],
    "rollbacks": []
}

The key concepts of the Chaos Toolkit are Experiments, Steady State Hypothesis and the experiment’s Method. The Method contains a combination of Probes and Actions.

Experiments

A Chaos Toolkit experiment is provided in a single file and is currently expressed in JSON.

Steady State Hypothesis

A Steady State Hypothesis describes “what normal looks like” for your system in order for the experiment to surface information you can make sense of.

Indeed, should your system be already in a broken state, your experiment could be difficult to interpret.

There is a single Steady State Hypothesis per experiment and is made of probes. If any of those fail, the experiment is halted.

Method

An experiment’s activities are contained within its Method block.

Probes

A probe is a way of observing a particular set of conditions in the system that is undergoing experimentation.

Actions

An action is a particular activity that needs to be enacted on the system under experimentation.

Rollbacks

An experiment may define a sequence of actions that revert what was undone during the experiment.