Skip to content

Learn all about Steady-State Hypothesis Tolerances

A Chaos Engineering experiment starts and ends with a steady-state hypothesis.

The objective is initially to act as a validation gateway whereby, if the steady-state is not met before we execute the method, then the experiment bails out. What can you learn from an unknown state already?

Then, once the method has been applied, the goal is understand if the system coped with the turbulence or if it deviated, implying a weakness may have been uncovered.

To achieve this, the Chaos Toolkit experiment expects you use probes to query your system’s state during the steady-state hypothesis. The validation of the probes’ output is performed by what we call tolerances.

Let’s get started with a basic example

Let’s take the simple experiment below:

{
"title": "Our default language is English",
"description": "We find the expected English language in the file",
"steady-state-hypothesis": {
    "title": "Our hypothesis is that lang file is in English",
    "probes": [
    {
        "type": "probe",
        "name": "lookup-lang-file",
        "tolerance": true,
        "provider": {
        "type": "python",
        "module": "os.path",
        "func": "exists",
        "arguments": {
            "path": "default.locale.txt"
        }
        }
    },
    {
        "type": "probe",
        "name": "lookup-text-in-lang-file",
        "tolerance": 0,
        "provider": {
        "type": "process",
        "path": "grep",
        "arguments": "welcome=hello default.locale.txt"
        }
    }
    ]
},
"method": [
    {
    "type": "action",
    "name": "switch-language-to-french",
    "provider": {
        "type": "process",
        "path": "sed",
        "arguments": "-i s/hello/bonjour/ default.locale.txt"
    }
    }
],
"rollbacks": [
    {
        "type": "action",
        "name": "switch-language-back-to-english",
        "provider": {
        "type": "process",
        "path": "sed",
        "arguments": "-i s/bonjour/hello/ default.locale.txt"
        }
    }
]
}
title: Our default language is English
description: We find the expected English language in the file
steady-state-hypothesis:
  title: Our hypothesis is that lang file is in English
  probes:
  - type: probe
    name: lookup-lang-file
    tolerance: true
    provider:
      type: python
      module: os.path
      func: exists
      arguments:
        path: default.locale.txt
  - type: probe
    name: lookup-text-in-lang-file
    tolerance: 0
    provider:
      type: process
      path: grep
      arguments: welcome=hello default.locale.txt
method:
  - type: action
    name: switch-language-to-french
    provider:
      type: process
      path: sed
     arguments: '-i s/hello/bonjour/ default.locale.txt'
rollbacks:
  - type: action
    name: switch-language-back-to-english
    provider:
      type: process
      path: sed
     arguments: '-i s/bonjour/hello/ default.locale.txt'

This experiment looks for the welcome message in the default locale file and expected "hello".

Here is an example of it running:

chaos run experiment.json
[2019-06-25 21:37:59 INFO] Validating the experiment's syntax
[2019-06-25 21:37:59 INFO] Experiment looks valid
[2019-06-25 21:37:59 INFO] Running experiment: Our default language is English
[2019-06-25 21:37:59 INFO] Steady state hypothesis: Our hypothesis is that lang file is in English
[2019-06-25 21:37:59 INFO] Probe: lookup-lang-file
[2019-06-25 21:37:59 INFO] Probe: lookup-text-in-lang-file
[2019-06-25 21:37:59 INFO] Steady state hypothesis is met!
[2019-06-25 21:37:59 INFO] Action: switch-language-to-french
[2019-06-25 21:37:59 INFO] Steady state hypothesis: Our hypothesis is that lang file is in English
[2019-06-25 21:37:59 INFO] Probe: lookup-lang-file
[2019-06-25 21:37:59 INFO] Probe: lookup-text-in-lang-file
[2019-06-25 21:37:59 INFO] Steady state hypothesis is met!
[2019-06-25 21:37:59 INFO] Let's rollback...
[2019-06-25 21:37:59 INFO] Rollback: switch-language-back-to-english
[2019-06-25 21:37:59 INFO] Action: switch-language-back-to-english
[2019-06-25 21:37:59 INFO] Experiment ended with status: completed

In this experiment, we have two probes checking two basic facets of our system. First, we ensure the locale file exists and then we validate the file contains our expected value.

Let’s now analyse the two tolerances we used.

First, we use a Python provider which calls the os.path.exists(path) standard library function. This function returns a boolean and that is what the tolerance checks for.

The second probe calls a process which sets its exit code to 0 when the command succeeds. Again, this is the value that the tolerance validates.

Now that we know the basics, let’s move on to see what are the supported tolerances.

Built-in supported tolerances

The experiment specification describes the supported tolerances but let’s review them more pragmatically here.

Chaos Toolkit aims at being easy for simple tasks whenever it can. In this case, for the general use-cases, we support the following tolerances:

  • boolean: set the tolerance property to true | false
  • integer: set the tolerance property to any integer (negative or positive)
  • string: set the tolerance property to a string
  • list: set the tolerance property to a sequence of values that can be compared to the output by value

In these three cases, the probe’s output must equal the given tolerance.

On top of this native types, we support also more advance cases such as:

  • range: set the tolerance property to:
{
    "type": "range",
    "range": [6.4, 7.5]
}
type: range
range:
  - 6.4
  - 7.5

The range is an inclusive min-max range made of numerical values. This is handy when validating a gauge for instance.

  • regex: set the tolerance property to:
{
    "type": "regex",
    "target": "stdout",
    "pattern": "^welcome=hello$"
}
type: regex
  target: stdout
  pattern: "^welcome=hello$"
  • pattern must be a valid regular expression, for now supported by the Python engine. This is useful when looking for a value in a raw string.
  • target is optional, and allows changing the default target for a given provider.

Currently supported targets per provider are as follows:

Provider Default Values
process "status" "stdout", "stderr"
http "status" "headers", "body"
python Undefined Undefined
  • jsonpath: set the tolerance property to:
{
    "type": "jsonpath",
    "path": "..."
}
type: jsonpath
path: ...

The path must be a valid JSONPath supported by the jsonpath2 library. This is handy when looking for a value in a mapping output.

  • probe: set the tolerance property to:
{
    "type": "probe",
    "provider": {
        "type": "python",
        ...
    }
}
type: probe
provider:
  type: python

In that case the tolerance is run as yet another probe which must return a boolean. The probe must accept an argument called value that is set to the output of the steady-state probe. In essence, a probe validating the output of another probe. This is advanced stuff only used when the builtin probes won’t cut it.

Common scenarios

Let’s now review how to apply these tolerances to most common scenarios.

Validate the return code of a boolean Python probe

In this case, the simple boolean tolerance will do.

For instance:

{
    "type": "probe",
    "name": "lookup-lang-file",
    "tolerance": true,
    "provider": {
        "type": "python",
        "module": "os.path",
        "func": "exists",
        "arguments": {
            "path": "default.locale.txt"
        }
    }
}
type: probe
name: lookup-lang-file
tolerance: true
provider:
  type: python
  module: os.path
  func: exists
  arguments:
    path: default.locale.txt

Validate the exit code of a process

In this case, the simple integer tolerance will do. Indeed, the Chaos Toolkit will look by default to the exit code of the process for validation.

In the above example:

{
    "type": "probe",
    "name": "lookup-text-in-lang-file",
    "tolerance": 0,
    "provider": {
        "type": "process",
        "path": "grep",
        "arguments": "welcome=hello default.locale.txt"
    }
}
type: probe
name: lookup-text-in-lang-file
tolerance: 0
provider:
  type: process
  path: grep
  arguments: welcome=hello default.locale.txt

Assuming, we would be expecting an error, which commonly translates to an exit code 1, we would switch to "tolerance": 1.

Validate the status code of a HTTP probe

In this case, the simple integer tolerance will do. Indeed, the Chaos Toolkit will look by default to the status code of the HTTP response for validation.

For instance:

{
    "type": "probe",
    "name": "resource-must-exist",
    "tolerance": 200,
    "provider": {
        "type": "http",
        "url": "https://example.com/api/v1/entity"
    }
}
type: probe
name: resource-must-exist
tolerance: 200
provider:
  type: http
  url: 'https://example.com/api/v1/entity'

Specific scenarios

Validate the stdout/stderr of a process

Assuming you want to validate the actual standard output of a process, you need to got a regular expression approach, as follows:

{
    "type": "probe",
    "name": "lookup-text-in-lang-file",
    "tolerance": {
        "type": "regex",
        "pattern": "welcome=hello",
        "target": "stdout"
    },
    "provider": {
        "type": "process",
        "path": "cat",
        "arguments": "default.locale.txt"
    }
}
type: probe
name: lookup-text-in-lang-file
tolerance:
  type: regex
  pattern: welcome=hello
  target: stdout
  provider:
    type: process
    path: cat
    arguments: default.locale.txt

The important extra property to set here is target which tells the Chaos Toolkit where to locate the value to apply the pattern against. The reason we set stdout here is because a process probe returns an object made of three properties: "status", "stdout" and "stderr".

Validate the JSON body of a HTTP probe

In this case, use a jsonpath tolerance.

For instance, let’s assume you receive the following JSON payload:

{
    "foo": [{"baz": "hello"}, {"baz": "bonjour"}]
}
{
    "type": "probe",
    "name": "resource-must-exist",
    "tolerance": {
        "type": "jsonpath",
        "path": "$.foo.*[?(@.baz)].baz",
        "expect": ["hello", "bonjour"],
        "target": "body"
    },
    "provider": {
        "type": "http",
        "url": "https://example.com/api/v1/entities"
    }
}
type: probe
name: resource-must-exist
tolerance:
  type: jsonpath
  path: '$.foo.*[?(@.baz)].baz'
  expect:
    - hello
    - bonjour
  target: body
provider:
  type: http
  url: 'https://example.com/api/v1/entities'

The expect property tells the Chaos Toolkit what are the values to match against once the JSON Path has been applied against the body of the response of the HTTP probe’s output.

You may also validate against a number of extracted values instead:

{
    "type": "probe",
    "name": "resource-must-exist",
    "tolerance": {
        "type": "jsonpath",
        "path": "$.foo.*[?(@.baz)].baz",
        "count": 2,
        "target": "body"
    },
    "provider": {
        "type": "http",
        "url": "https://example.com/api/v1/entities"
    }
}
type: probe
name: resource-must-exist
tolerance:
  type: jsonpath
  path: '$.foo.*[?(@.baz)].baz'
  count: 2
  target: body
provider:
  type: http
  url: 'https://example.com/api/v1/entities'

Validate the output of a Python probe returning a mapping

In this case, use a jsonpath tolerance.

For instance, let’s assume you receive the following payload:

{
    "foo": [{"baz": "hello"}, {"baz": "bonjour"}]
}
{
    "type": "probe",
    "name": "resource-must-exist",
    "tolerance": {
        "type": "jsonpath",
        "path": "$.foo.*[?(@.baz)].baz",
        "expect": ["hello", "bonjour"]
    },
    "provider": {
        "type": "python",
        "module": "...",
        "func": "..."
    }
}
type: probe
name: resource-must-exist
tolerance:
  type: jsonpath
  path: '$.foo.*[?(@.baz)].baz'
  expect:
    - hello
    - bonjour
provider:
  type: python
  module: ...
  func: ...

The expect property tells the Chaos Toolkit what are the values to match against once the JSON Path has been applied against the probe’s output.

You may also validate against a number of extracted values instead:

{
    "type": "probe",
    "name": "resource-must-exist",
    "tolerance": {
        "type": "jsonpath",
        "path": "$.foo.*[?(@.baz)].baz",
        "count": 2
    },
    "provider": {
        "type": "python",
        "module": "...",
        "func": "..."
    }
}
type: probe
name: resource-must-exist
tolerance:
  type: jsonpath
  path: '$.foo.*[?(@.baz)].baz'
  count: 2
provider:
  type: python
  module: ...
  func: ...

Advanced Scenarios

The last case you may be reviewing now is when the default tolerances cannot support your use case. Then, you want to create your own tolerance by writing a new probe that takes the output, of the tolerance under validation, as an argument. Usually, this tolerance probe is implemented in Python to have more power but this isn’t compulsory.

For instance, the following:

{
    "type": "probe",
    "name": "lookup-text-in-lang-file",
    "tolerance": 0,
    "provider": {
        "type": "process",
        "path": "grep",
        "arguments": "welcome=hello default.locale.txt"
    }
}
type: probe
name: lookup-text-in-lang-file
tolerance: 0
provider:
  type: process
  path: grep
  arguments: welcome=hello default.locale.txt

could be written as follows:

{
    "type": "probe",
    "name": "lookup-text-in-lang-file",
    "tolerance": {
        "name": "search-text",
        "type": "probe",
        "provider": {
            "type": "python",
            "module": "my.package",
            "func": "search_text",
            "arguments": {
                "path": "default.local.txt",
                "search_for": "welcome=hello"
            }
        }
    },
    "provider": {
        "type": "process",
        "path": "cat",
        "arguments": "default.locale.txt"
    }
}
type: probe
name: lookup-text-in-lang-file
tolerance:
  name: search-text
  type: probe
  provider:
    type: python
    module: my.package
    func: search_text
    arguments:
      path: default.local.txt
      search_for: welcome=hello
provider:
  type: process
  path: cat
  arguments: default.locale.txt

In that case, implement the search_text(path: str, search_for: str, value: dict) -> bool function in the my.package Python module.

import re


def search_text(path: str, search_for: str, value: dict = None) -> bool:
    with open(path) as f:
        content = f.read()
        return re.compile(search_for).match(value["stdout"]) is not None

As you can see, the value argument is not declared but must exist in the signature of the function. It is injected by the Chaos Toolkit and is set to the probe’s output.