Skip to content

An Open API for Chaos Engineering Experiments

Introduction

The purpose of this specification is to formalize the elements of a Chaos Engineering experiment and offer a way to federate the community around a common syntax and semantic.

As a fairly recent field, Chaos Engineering is a dynamic and its foundations are still emerging. However, it appears certain concepts are settling down enough to start agreeing on a shared understanding.

This specification is not prescriptive and does not aim at forcing the community into one direction, rather it strives at providing a common vocabulary that new practitioners can easily make sense of.

It is necessary to appreciate that this document does not specify what tools, such as the Chaos Monkey or similar, should look like. Instead, this document specifies how Chaos Engineering Experiment could be described, shared and conducted collaboratively.

Conventions Used in This Document

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119.

The terms “JSON”, “JSON text”, “JSON value”, “member”, “element”, “object”, “array”, “number”, “string”, “boolean”, “true”, “false”, and “null” in this document are to be interpreted as defined in RFC 7159.

Other formats

While this specification uses JSON to define its elements, implementations may allow loading from other formats, such as YAML. As long as the output of such format respects the specification herein.

Chaos Engineering Elements

Overview

An Experiment is one possible description of the principles of the Chaos Engineering. The intention of such a description is to provide shared understanding around a hypothesis on how to discover system’s behavior under certain conditions.

An Experiment declares a steady state hypothesis, alongside probes to validate this steady state is met, and a method as a sequence actions and probes, to interact and query the system respectively.

By using a variety of probes, experiments should gather information to sense behaviors in the system, potentially leading to systemic patterns that can be stabilized.

Experiment

A Chaos Engineering experiment, or simply an experiment, describes both the elements and the order in which they should be applied.

An experiment is a JSON object.

An experiment MUST declare:

  • a title property
  • a description property
  • a method property

The experiment’s title and description are meant for humans and therefore should be as descriptive as possible to clarify the experiment’s rationale.

Title and description are JSON strings with no maximum length.

An experiment SHOULD also declare:

  • a steady-state-hypothesis property
  • a rollbacks property

An experiment MAY finally declare:

  • a tags property
  • a secrets property
  • an extension property
  • a contributions property
  • a controls property
  • a runtime property

Tags provide a way of categorizing experiments. It is a sequence of JSON strings.

Extensions define opaque payloads for vendors to carry valuable information.

Contribution describes valuable properties of the target system, such as “reliability” or “durability”, that an experiment contribute to. This information can be aggregated together with other experiments’ contributions to better appreciate where the focus is put and where it is not.

Controls describe out-of-band capabilities applied during the experiment’s execution.

Steady State Hypothesis

The Steady State Hypothesis element describes what normal looks like in your system before the Method element is applied. If the steady state is not met, the Method element is not applied and the experiment MUST bail out.

The Steady State Hypothesis element is a JSON object.

Steady State Hypothesis element MUST declare:

  • a title property
  • a probes property

The title is meant for humans and therefore should clarify the rationale for this hypothesis.

Each Probe MUST define a tolerance property that acting as a gate mechanism for the experiment to carry on or bail. Any Probe that does not fall into the tolerance zone MUST fail the experiment.

Steady State Hypothesis element MAY declare:

  • a controls property

Controls describe out-of-band capabilities applied during the experiment’s execution.

Steady State Probe Tolerance

Probes of the Steady State Hypothesis MUST declare an additional property named tolerance.

The tolerance property’s value MUST be one of:

  • a scalar: JSON string, number (an integer), boolean
  • a sequence of scalars: JSON string, number, boolean
  • an object

In the case of a scalar or the sequence, the tolerance validation MUST be strict. The value returned by the Probe MUST be checked against the scalar value. The experiment MUST bail when both fail to match.

When the tolerance is a sequence. If it has only two values, those two values represent a lower and upper bound within which the Probe returned value must fall (inclusive).

When the sequence has more than two elements, the Probe returned value must be contained in that sequence.

When the tolerance is an object, it MUST have a type property which MUST be one of the followings: "probe", "regex", "jsonpath" or "range".

When the type property is "probe", the object MUST be a Probe that is applied. The probe should take two arguments, value and secrets where the value is the Probe returned value and secrets a Secret object or null. Its returned status MUST be successful for the tolerance to be considered valid.

When the type property is "regex", the object MUST have a pattern property which MUST be a valid regular expression. The tolerance succeeds if the Probe returned value is matched against the pattern. Object MAY have a target property which MUST be a valid value returned for a given provider.

When the type property is "jsonpath", the object MUST have a path property which MUST be a valid JSON Path. In addition, the object MAY have a expect property which is used to compare each value matched by the JSON Path to that value. The expect property value MUST be a scalar. When the expect property is not present, the tolerance succeeds if the JSON Path matched at least one item.

When the type property is "range", the object MUST have a range property which MUST be a sequence of length two. The first entry of the sequence MUST be the lower bound and the second entry MUST be the upper bound. Both entries MUST be JSON numbers.

In addition, when the Probe returned value is an object with a status property, the tested value is the value of that property.

Some examples of tolerance properties.

A boolean tolerance:

"tolerance": true
tolerance: true

A integer tolerance:

"tolerance": 8
tolerance: 8

A string tolerance:

"tolerance": "OK"
tolerance: "OK"

A sequence tolerance with lower and upper bounds:

"tolerance": [4, 9]
tolerance:
  - 4
  - 9

A sequence tolerance, the value must be contained in that sequence:

"tolerance": [4, 9, 78]
tolerance:
  - 4
  - 9
  - 78

A Probe tolerance:

"tolerance": {
    "type": "probe",
    "name": "should-exist",
    "provider": {
        "type": "python",
        "module": "os.path",
        "func": "exists",
        "arguments": {
            "path": "some/file"
        }
    }
}
tolerance:
  type: probe
  name: should-exist
  provider:
    type: python
    module: os.path
    func: exists
    arguments:
      path: some/file

A regex tolerance:

"tolerance": {
    "type": "regex",
    "pattern": "[0-9]{3}"
}
tolerance:
  type: regex
  pattern: '[0-9]{3}'

A regex tolerance with a non default target:

"tolerance": {
    "type": "regex",
    "target": "stdout",
    "pattern": "[0-9]{2}"
}
tolerance:
  type: regex
  target: stdout
  pattern: '[0-9]{2}'

A jsonpath tolerance:

"tolerance": {
    "type": "jsonpath",
    "path": "foo[*].baz"
}
tolerance:
  type: jsonpath
  path: 'foo[*].baz'

A jsonpath tolerance with an expected value to match:

"tolerance": {
    "type": "jsonpath",
    "path": "foo[*].baz",
    "expect": 4
}
tolerance:
  type: jsonpath
  path: 'foo[*].baz'
  expect: 4

Two range tolerances:

"tolerance": {
    "type": "range",
    "range": [4, 8]
}
tolerance:
  type: range
  range:
    - 4
    - 8
"tolerance": {
    "type": "range",
    "range": [4.6, 8.9]
}
tolerance:
  type: range
  range:
    - 4.6
    - 8.9

Contributions

Contributions describe the valuable system properties an experiment targets as well as how much they contributes to it. Those properties usually refer to aspects stakeholders care about. Aggregated they offer a powerful metric about the effort and focus on building confidence across the system.

Contributions are declared under the top-level contributions property as an object. Properties of that object MUST be JSON strings representing the name of a contribution. The values MUST be the weight of a given contribution and MUST be one of "high", "medium", "low" or "none". The "none" value is not the same as a missing contribution from the contributions object. That value marks explicitly that a given contribution is not addressed by an experiment. A missing contribution means impact via this experiment is unknown for this contribution.

Here is a contribution example:

"contributions": {  
    "reliability": "high",
    "security": "none",
    "scalability": "medium"
}
contributions:
  reliability: high
  security: none
  scalability: medium

This sample tells us that the experiment contributes mainly to exploring reliability of the system and moderately to its scalability. However, it is explicit here this experiment does not address security.

On the other hand:

"contributions": {  
    "reliability": "high",
    "scalability": "medium"
}
contributions:
  reliability: high
  scalability: medium

This tells us the same about reliability and scalability but we can’t presume anything about security.

Method

The Method describes the sequence of Probe and Action elements to apply. The Method is declared under method property at the top-level of the experiment.

The method MAY have at least one element which can be either a Probe or an Action.

The elements MUST be applied in the order they are declared.

An empty method is allowed for running experiments with Steady States Hypothesis only.

Probe

A Probe collects information from the system during the experiment.

A Probe is a JSON object. A Probe is declared fully or reference another Probe through the ref property.

When declared fully, a Probe MUST declare:

  • a type property
  • a name property
  • a provider property

The type property MUST be the JSON string "probe".

The name property is a free-form JSON string that MAY be considered as an identifier within the experiment.

It MAY also declare:

  • a configuration property
  • a background property
  • a controls property

The configuration property MUST be a JSON string referencing an identifier declared in the top-level configuration property. It is assumed that when not declared, the Probe requires no configuration.

The background property MUST be a JSON boolean value either true or false. It is assumed that, when that property is not declared, it is set to false. When that property is set to true it indicates the Probe MUST not block and the next Action or Probe should immediately be applied.

When a Probe references another Probe in the Experiment, the Probe MUST declare a single property called ref.

The ref property MUST be a JSON string which MUST be the name of a declared Probe.

Controls describe out-of-band capabilities applied during the experiment’s execution.

Action

An Action performs an operation against the system.

An Action collects information from the system during the experiment.

An Action is a JSON object. An Action is declared fully or reference another Action through the ref property.

When declared fully, a Action MUST declare:

  • a type property
  • a name property
  • a provider property

The type property MUST be the JSON string "action".

The name property is a free-form JSON string that MAY be considered as an identifier within the experiment.

It MAY also declare:

  • a controls property
  • a configuration property
  • a background property
  • a pauses property

The configuration property MUST be a JSON string referencing an identifier declared in the top-level configuration property. It is assumed that when not declared, the Action requires no configuration.

The background property MUST be a JSON boolean value either true or false. It is assumed that, when that property is not declared, it is set to false. When that property is set to true it indicates the Action MUST not block and the next Action or Probe should immediately be applied.

The pauses property MUST be a JSON object which MAY have one or the two following properties:

  • before
  • after

In both cases, the value MUST be JSON number indicating the number of seconds to wait before continuing. The before pause MUST be performed before the Action while the after MUST be performed afterwards.

When a Action references another Action in the Experiment, the Action MUST declare a single property called ref.

The ref property MUST be a JSON string which MUST be the name of a declared Action.

Controls describe out-of-band capabilities applied during the experiment’s execution.

Action or Probe Provider

A provider MUST be a JSON object which MUST declare a type property that decides the other expected properties.

The type property MUST be one of "python", "http" or "process".

Info

This specification only mentions those three providers but it could grow to support more, such as "go", "rust" or "grpc"

Python Provider

A Python Provider declares a Python function to be applied.

A Python Provider MUST declare the following:

  • a module property
  • a func property

It SHOULD also declare an arguments property when the function expects them.

The module property is the fully qualified module exposing the function. It MUST be a JSON string.

The func property is the name of the function to apply. It MUST be a JSON string.

When provided, the arguments property MUST be a JSON object which properties are the names of the function’s arguments. When a function’s signature has default values for some of its arguments, those MAY be omitted from the arguments object. In that case, those default values will be used.

Argument values MUST be valid JSON entities.

In addition, the provider object MAY declare a secrets property. This secrets property MUST be a JSON array of JSON strings referencing identifiers declared in the top-level secrets property. It is assumed that when not declared, the Action requires no secrets.

HTTP Provider

A HTTP Provider declares a URL to be called.

A HTTP Provider MUST declare the following:

  • a url property

The url property MUST be a JSON string representing a URL as per RFC 3986.

In addition, the provider object MAY declare any of the followings:

  • a method property
  • a headers property
  • a expected_status property
  • a arguments property
  • a timeout property
  • a secrets property

The method property MUST be a JSON string, such as "POST", as per RFC 7231. It defaults to "GET".

The headers property MUST be a JSON object which properties are header names and values are header values, as per RFC 7231.

When provided, the arguments property MUST be a JSON object which properties are parameters of the HTTP request.

When method is "GET", the arguments are mapped as a query-string of the URL. Otherwise, the arguments are passed as the request body’s data and the encoding depends on the "Content-Type" provided in the headers object.

The timeout property MUST be either a JSON number specifying how long the request should take to complete. Or a JSON array that MUST made of two JSON numbers, the first one indicating the connection timeout, the second the request timeout to respond.

The secrets property MUST be a JSON array of JSON strings referencing identifiers declared in the top-level secrets property. It is assumed that when not declared, the Action requires no secrets.

The HTTP provider MUST return an object with the following properties:

  • status which MUST be a valid HTTP returned code as defined in RFC 7231
  • headers which MUST be an object
  • body which MUST be a string

Process Provider

A Process Provider declares a process to be called.

A Process Provider MUST declare the following:

  • a path property

The path property MUST be a JSON string of a path to an executable.

In addition, the provider object MAY declare any of the followings:

  • a arguments property
  • a timeout property
  • a secrets property

The arguments property MUST be a JSON array or a JSON string which defines the process arguments. Those arguments are passed in order to the process arguments.

The timeout property MUST be a JSON number specifying how long the process should take to complete.

The secrets property MUST be a JSON array of JSON strings referencing identifiers declared in the top-level secrets property. It is assumed that when not declared, the Action requires no secrets.

The Process provider MUST return an object with the following properties:

  • status which MUST be a scalar of the process return code
  • stdout which MUST be bytes sequence encoded with the UTF-8 encoding representing the stdout payload of the process
  • stderr which MUST be bytes sequence encoded with the UTF-8 encoding representing the stderr payload of the process

Rollbacks

Rollbacks declare the sequence of actions that attempt to put the system back to its initial state.

The experiment MAY declare a single rollbacks property which is a JSON array consisting of Actions.

A failed rollback MUST not bail the sequence of rollbacks.

Secrets

Secrets declare values that need to be passed on to Actions or Probes in a secure manner.

The secrets property MUST be a JSON object. Its properties are identifiers referenced by Actions and Probes.

The value of each identifier is a JSON object which properties are the secrets keys and the properties values are the secrets values.

Referenced secrets MUST be injected into probes and actions when they are applied. Probes and actions MUST NOT modify the secrets.

Secrets MUST be passed a mapping of keys and values to probes and actions.

An example of a secrets element at the top-level:

{
    "secrets": {
        "kubernetes": {
            "token": "XYZ"
        }
    }
}
secrets:
  kubernetes:
    token: XYZ

This can then referenced from probes or actions:

{
    "type": "probe",
    "secrets": "kubernetes"
}
type: probe
secrets: kubernetes

Inline Secrets

Secrets MAY be inlined in the Experiment directly.

{
    "secrets": {
        "kubernetes": {
            "token": "ABCDEF-1234-XYZ"
        }
    }
}
secrets:
  kubernetes:
    token: ABCDEF-1234-XYZ

Environment Secrets

Secrets MAY be retrieved from the environment. In that case, they must be declared as a JSON object with a type property set to "env". The environment variable MUST be declared in the key property as a JSON string.

{
    "secrets": {
        "kubernetes": {
            "token": {
                "type": "env",
                "key": "KUBERNETES_TOKEN"
            }
        }
    }
}
secrets:
  kubernetes:
    token:
      type: env
      key: KUBERNETES_TOKEN

Vault Secrets

Secrets MAY be retrieved from a HashiCorp vault instance. In that case, they must be declared as a JSON object with a type property set to "vault". The path to the key MUST be declared in the path property as a JSON string.

{
    "secrets": {
        "myapp": {
            "token": {
                "type": "vault",
                "path": "secrets/something"
            }
        }
    }
}
secrets:
  myapp:
    token:
      type: vault
      path: secrets/something

When only the path property is set, the whole secrets payload at the given path MUST be set to the Chaos Toolkit secret key.

A key property MAY be set to select a specific value from the Vault secret payload.

The Vault url MUST be provided in the Configuration section via the "vault_addr" property.

Vault authentication MUST at least support:

  • token based authentication The token MUST be provided in the Configuration section via the "vault_token" property
  • AppRole authentication The role-id and secret-id MUST be provided in the Configuration section via the "vault_role_id" and "vault_role_secret" properties

The Vault KV secrets version MAY be provided via the "vault_kv_version" Configuration key. If not provided, it MUST default to "2".

Examples:

Vault secret at path secret/something:

{
    "foo": "bar",
    "baz": "hello"
}
foo: bar
baz: hello

Then in your Chaos Toolkit experiment:

{
    "secrets": {
        "myapp": {
            "token": {
                "type": "vault",
                "path": "secrets/something"
            }
        }
    }
}
secrets:
  myapp:
    token:
      type: vault
      path: secrets/something

means the secrets will become:

"token": {
    "foo": "bar",
    "baz": "hello"
}
token:
  foo: bar
  baz: hello

However:

{
    "secrets": {
        "myapp": {
            "token": {
                "type": "vault",
                "path": "secrets/something",
                "key": "foo"
            }
        }
    }
}
secrets:
  myapp:
    token:
      type: vault
      path: secrets/something
      key: foo

means the secrets will become:

"token": "bar"
token: bar

Configuration

Configuration is meant to provide runtime values to actions and probes.

The configuration element MUST be a JSON object. The value of each property MUST be a JSON string, number, or object whose properties are considered the configuration lookup. Configuration must be passed to all Probes and actions requiring it. Probes and actions MUST NOT modify the configuration.

Configurations MUST be passed a mapping of keys and values to probes and actions.

An example of a configuration element at the top level:

{
    "configuration": {
        "some_service": "http://127.0.0.1:8080",
        "vault_addr": {
            "type": "env",
            "key": "VAULT_ADDR"
        }
    }
}
configuration:
  some_service: 'http://127.0.0.1:8080'
  vault_addr:
    type: env
    key: VAULT_ADDR

Inline Configurations

Configurations MAY be inlined in the Experiment directly.

{
    "configuration": {
        "some-service": "http://127.0.0.1:8080"
    }
}
configuration:
  some-service: 'http://127.0.0.1:8080'

Environment Configurations

Configurations MAY be retrieved from the environment. In that case, they must be declared as a JSON object with a type property set to "env". The environment variable MUST be declared in the key property as a JSON string.

The default key is OPTIONAL and MAY be used when the environment variable can be undefined and fallback to a default value for the experiment.

{
    "configuration": {
        "vault_addr": {
            "type": "env",
            "key": "VAULT_ADDR",
            "default": "https://127.0.0.1:8200"
        }
    }
}
configuration:
  vault_addr:
    type: env
    key: VAULT_ADDR
    default: 'https://127.0.0.1:8200'

Variable Substitution

Probes and Actions argument values MAY be dynamically resolved at runtime.

Dynamic values MUST follow the syntax ${name} where name is an identifier declared in either the Configuration or Secrets sections. When name is declared in both sections, the Configuration section MUST take precedence.

Dynamic values MUST be substituted before being passed to Probes or Actions.

Other values, such as the HTTP Probe url, MAY be substituted as well.

Controls

Controls describe out-of-band capabilities applied when the experiment is executed. Controls are used to declare operations that should be carried by external tools.

Controls MAY be declared at each of the following levels:

  • experiment
  • steady-state-hypothesis
  • activity

Controls MUST be applied before and after each of those levels. Schematically, this looks like this:

apply experiment control before experiment starts
start experiment
    apply steady state control before steady-state probes are started
        start steady-state processing
            apply activity control before each probe is applied
            run each probe
            apply activity control after each probe is applied
    apply steady state control after steady-state probes have completed
    apply steady state control before method activities are started
        start method processing
            apply activity control before each activity is applied
            run each activity
            apply activity control after each activity is applied
    apply steady state control after method activities have completed
    apply steady state control before rollback activities are started
        start rollback processing
            apply activity control before each activity is applied
            run each activity
            apply activity control after each activity is applied
    apply steady state control after rollback activities have completed
apply experiment control after experiment completes

Controls MAY be omitted anywhere and MUST NOT be applied at a level they are not declared.

Controls MUST NOT fail the experiment’s execution due to unforeseen conditions.

Controls are declared with the controls property which is set to a JSON array.

Controls MAY modify Configuration and Secrets. In that case changes MUST be made visible to the experiment.

An item of that array MUST be a control, which is a JSON object which MUST have the following properties:

  • a name property which MUST be a JSON string
  • a provider property MUST be a JSON object

The provider object indicates which implementation of the control to use. It MUST declare the following properties:

  • a type JSON string which MUST be "python"
  • a module JSON string when the type property is "python". It MUST be a a Python module dotted path implementing the control interface

A control object MAY also declare the following property:

  • a scope property MUST be a JSON string
  • automatic, a JSON boolean which MUST be true by default (when omitted)

The scope value MUST be one of "before" or "after". When the scope property is omitted, the control MUST be applied before and after. When the scope property is set, the control MUST be applied only on that scope.

When the automatic property is set to false, it MUST be understood that the control cannot be applied anywhere but where it is declared.

Examples of Controls:

Just a generic declaration of a control at the top-level of the experiment:

"controls": [
    {
        "name": "tracing",
        "provider": {
            "type": "python",
            "module": "chaostracing.control"
        }
    }
]
controls:
  - name: tracing
    provider:
    type: python
    module: chaostracing.control

Another control by applied only as post-control:

"controls": [
    {
        "name": "tracing",
        "scope": "post",
        "provider": {
            "type": "python",
            "module": "chaostracing.control"
        }
    }
]
controls:
  - name: tracing
    scope: post
    provider:
    type: python
    module: chaostracing.control

Finally, a top-level level control not applied anywhere else down the tree:

"controls": [
    {
        "name": "tracing",
        "automatic": false,
        "provider": {
            "type": "python",
            "module": "chaostracing.control"
        }
    }
]
controls:
  - name: tracing
    automatic: false
    provider:
    type: python
    module: chaostracing.control

Extensions

An Experiment MAY declare an extensions property which MUST be an array of objects. Each object MUST declare a non-empty name property.

Extensions are used in two scenarios:

  • future core features that need to be ironed out by the community first
  • vendor specific payload

In both cases, their actual usage is runtime dependent, this specification does not declare any meaning to an extension.

Below is an example of an Extension:

{
    "extensions": [{
        "name": "vendorX",
        "data": "..."
    }]
}
extensions:
  - name: vendorX
    data: ...

Runtime

An Experiment MAY declare an runtime property which MUST be an object of objects. The runtime block is used to define the runtime strategy of the experiment.

The runtime property MAY declare:

  • a rollbacks property which is an object with a single property strategy. The strategy property MUST be one of "default", "always", "never" or "deviated"
  • a hypothesis property which is an object with a single property strategy. The strategy property MUST be one of "default", "before-method-only", "after-method-only", "during-method-only" or "continuously"

Examples

The following examples MUST NOT be considered normative.

Minimal Experiment

Here is an example of the most minimal experiment:

{
    "title": "Moving a file from under our feet is forgivable",
    "description": "Our application should re-create a file that was removed",
    "contributions": {
        "reliability": "high",
        "availability": "high"
    },
    "steady-state-hypothesis": {
        "title": "The file must be around first",
        "probes": [
            {
                "type": "probe",
                "name": "file-must-exist",
                "tolerance": true,
                "provider": {
                    "type": "python",
                    "module": "os.path",
                    "func": "exists",
                    "arguments": {
                        "path": "some/file"
                    }
                }
            }
        ]
    },
    "method": [
        {
            "type": "action",
            "name": "file-be-gone",
            "provider": {
                "type": "python",
                "module": "os",
                "func": "remove",
                "arguments": {
                    "path": "some/file"
                }
            },
            "pauses": {
                "after": 5
            }
        },
        {
            "ref": "file-must-exist"
        }
    ]
}
title: Moving a file from under our feet is forgivable
description: Our application should re-create a file that was removed
contributions:
  reliability: high
  availability: high
steady-state-hypothesis:
  title: The file must be around first
  probes:
    - type: probe
      name: file-must-exist
      tolerance: true
      provider:
        type: python
        module: os.path
        func: exists
        arguments:
        path: some/file
method:
  - type: action
    name: file-be-gone
    provider:
      type: python
      module: os
      func: remove
      arguments:
        path: some/file
    pauses:
      after: 5
  - ref: file-must-exist

More Complex Experiment

Below is an example of a fully featured experiment that uses various extensions to perform actions, probing and steady-state hypothesis validation.

{
    "title": "Are our users impacted by the loss of a function?",
    "description": "While users query the Astre function, they should not be impacted if one instance goes down.",
    "contributions": {
        "reliability": "high",
        "availability": "high",
        "performance": "medium",
        "security": "none"
    },
    "tags": [
        "kubernetes",
        "openfaas",
        "cloudnative"
    ],
    "configuration": {
        "prometheus_base_url": "http://demo.foo.bar"
    },
    "secrets": {
        "global": {
            "auth": "Basic XYZ"
        }
    },
    "controls": [
        {
            "name": "tracing",
            "provider": {
                "type": "python",
                "module": "chaostracing.control"
            }
        }
    ],
    "steady-state-hypothesis": {
        "title": "Function is available",
        "probes": [
            {
                "type": "probe",
                "name": "function-must-exist",
                "tolerance": 200,
                "provider": {
                    "type": "http",
                    "secrets": ["global"],
                    "url": "http://demo.foo.bar/system/function/astre",
                    "headers": {
                        "Authorization": "${auth}"
                    }
                }
            },
            {
                "type": "probe",
                "name": "function-must-respond",
                "tolerance": 200,
                "provider": {
                    "type": "http",
                    "timeout": [3, 5],
                    "secrets": ["global"],
                    "url": "http://demo.foo.bar/function/astre",
                    "method": "POST",
                    "headers": {
                        "Content-Type": "application/json",
                        "Authorization": "${auth}"
                    },
                    "arguments": {
                        "city": "Paris"
                    }
                }
            }
        ]
    },
    "method": [
        {
            "type": "action",
            "name": "simulate-user-traffic",
            "background": true,
            "provider": {
                "type": "process",
                "path": "vegeta",
                "arguments": "-cpus 2 attack -targets=data/scenario.txt -workers=2 -connections=1 -rate=3 -timeout=3s -duration=30s -output=result.bin"
            }
        },
        {
            "type": "action",
            "name": "terminate-one-function",
            "provider": {
                "type": "python",
                "module": "chaosk8s.pod.actions",
                "func": "terminate_pods",
                "arguments": {
                    "ns": "openfaas-fn",
                    "label_selector": "faas_function=astre",
                    "rand": true
                }
            },
            "pauses": {
                "before": 5
            }
        },
        {
            "type": "probe",
            "name": "fetch-openfaas-gateway-logs",
            "provider": {
                "type": "python",
                "module": "chaosk8s.pod.probes",
                "func": "read_pod_logs",
                "arguments": {
                    "label_selector": "app=gateway",
                    "last": "35s",
                    "ns": "openfaas"
                }
            }
        },
        {
            "type": "probe",
            "name": "query-total-function-invocation",
            "provider": {
                "type": "python",
                "module": "chaosprometheus.probes",
                "func": "query_interval",
                "secrets": ["global"],
                "arguments": {
                    "query": "gateway_function_invocation_total{function_name='astre'}",
                    "start": "1 minute ago",
                    "end": "now",
                    "step": 1
                }
            }
        }
    ],
    "rollbacks": []
}
---
title: Are our users impacted by the loss of a function?
description: While users query the Astre function, they should not be impacted if one instance goes down.
contributions:
  reliability: high
  availability: high
  performance: medium
  security: none
tags:
  - kubernetes
  - openfaas
  - cloudnative
configuration:
  prometheus_base_url: http://demo.foo.bar
secrets:
  global:
    auth: Basic XYZ
controls:
  - name: tracing
    provider:
      type: python
      module: chaostracing.control
steady-state-hypothesis:
  title: Function is available
  probes:
    - type: probe
      name: function-must-exist
      tolerance: 200
      provider:
        type: http
        secrets:
          - global
        url: http://demo.foo.bar/system/function/astre
        headers:
          Authorization: "${auth}"
    - type: probe
      name: function-must-respond
      tolerance: 200
      provider:
        type: http
        timeout:
          - 3
          - 5
        secrets:
          - global
        url: http://demo.foo.bar/function/astre
        method: POST
        headers:
          Content-Type: application/json
          Authorization: "${auth}"
        arguments:
          city: Paris
method:
  - type: action
    name: simulate-user-traffic
    background: true
    provider:
      type: process
      path: vegeta
      arguments: "-cpus 2 attack -targets=data/scenario.txt -workers=2 -connections=1 -rate=3 -timeout=3s -duration=30s -output=result.bin"
  - type: action
    name: terminate-one-function
    provider:
      type: python
      module: chaosk8s.pod.actions
      func: terminate_pods
      arguments:
        ns: openfaas-fn
        label_selector: faas_function=astre
        rand: true
    pauses:
      before: 5
  - type: probe
    name: fetch-openfaas-gateway-logs
    provider:
      type: python
      module: chaosk8s.pod.probes
      func: read_pod_logs
      arguments:
        label_selector: app=gateway
        last: 35s
        ns: openfaas
  - type: probe
    name: query-total-function-invocation
    provider:
    type: python
    module: chaosprometheus.probes
    func: query_interval
    secrets:
      - global
    arguments:
      query: gateway_function_invocation_total{function_name='astre'}
      start: 1 minute ago
      end: now
      step: 1
rollbacks: []