Skip to content

Extension chaosdatadog

Version 0.3.1
Repository https://github.com/chaostoolkit-incubator/chaostoolkit-datadog

Version License

Build Python versions

This project contains Chaos Toolkit activities and tolerances to work with DataDog.

Install

This package requires Python 3.8+

To be used from your experiment, this package must be installed in the Python environment where chaostoolkit already lives.

$ pip install chaostoolkit-datadog

Usage

A typical experiment using this extension would look like this:

{
    "version": "1.0.0",
    "title": "Run a, experiment using a DataDog SLO to verify our system",
    "description": "n/a",
    "configuration": {
        "datadog_host": "https://datadoghq.eu"
    },
    "steady-state-hypothesis": {
        "title": "n/a",
        "probes": [
            {
                "type": "probe",
                "name": "read-slo",
                "tolerance": {
                    "type": "probe",
                    "name": "check-slo",
                    "provider": {
                        "type": "python",
                        "module": "chaosdatadog.slo.tolerances",
                        "func": "slo_must_be_met",
                        "arguments": {
                            "threshold": "7d"
                        }
                    }
                },
                "provider": {
                    "type": "python",
                    "module": "chaosdatadog.slo.probes",
                    "func": "get_slo",
                    "arguments": {
                        "slo_id": "..."
                    }
                }
            }
        ]
    },
    "method": []
}

That’s it!

Please explore the code to see existing probes and actions.

Configuration

In the configuration block you may want to specify the DataDog host you are targetting:

    "configuration": {
        "datadog_host": "https://datadoghq.eu"
    },

The authentication can be set using the typical DataDog environment variables, notably:

  • DD_API_KEY: the API key
  • DD_APP_KEY: the application key

Test

To run the tests for the project execute the following:

$ pdm run test

Formatting and Linting

We use ruff to both lint and format this repositories code.

Before raising a Pull Request, we recommend you run formatting against your code with:

$ pdm run format

This will automatically format any code that doesn’t adhere to the formatting standards.

As some things are not picked up by the formatting, we also recommend you run:

$ pdm run lint

To ensure that any unused import statements/strings that are too long, etc. are also picked up.

Contribute

If you wish to contribute more functions to this package, you are more than welcome to do so. Please, fork this project, make your changes following the usual PEP 8 code style, sprinkling with tests and submit a PR for review.

Exported Activities

metrics


get_metrics_state

Type probe
Module chaosdatadog.metrics.probes
Name get_metrics_state
Return boolean

The next function is to:

  • Query metrics from any time period (timeseries and scalar)
  • Compare the metrics to some treshold in some time. Ex.(CPU, Memory, Network)
  • Check is the sum of datapoins is over some value. Ex. (requests, errors, custom metrics)

you can use a comparison to check if all data points in the query satisfy the steady state condition

Ex. cumsum(sum:istio.mesh.request.count.total{kube_service:test, response_code:500}.as_count())

the above query is a cumulative sum of all requests with response code of 500. if you want your request in a window of time you have a deviant hypothesis if you have more than 30 http_500 errors the comparison should be <. so any value below 30 is a steady state.

the allowed comparison values are [“>”, “<”, “>=”, “<=”, “==”]

Signature:

def get_metrics_state(query: str,
                      comparison: str,
                      threshold: float,
                      minutes_before: int,
                      configuration: Dict[str, Dict[str, str]] = None,
                      secrets: Dict[str, Dict[str, str]] = None) -> bool:
    pass

Arguments:

Name Type Default Required
query string Yes
comparison string Yes
threshold number Yes
minutes_before integer Yes

Usage:

{
  "name": "get-metrics-state",
  "type": "probe",
  "provider": {
    "type": "python",
    "module": "chaosdatadog.metrics.probes",
    "func": "get_metrics_state",
    "arguments": {
      "query": "",
      "comparison": "",
      "threshold": null,
      "minutes_before": 0
    }
  }
}
name: get-metrics-state
provider:
  arguments:
    comparison: ''
    minutes_before: 0
    query: ''
    threshold: null
  func: get_metrics_state
  module: chaosdatadog.metrics.probes
  type: python
type: probe

slo


get_slo

Type probe
Module chaosdatadog.slo.probes
Name get_slo
Return mapping

Get a SLO’s history for the given period.

Periods should be given relative to each other. If end_period isn’t provided it will resolve to now (UTC). start_period is always relative to end_period. You can use a format such as: "X minutes ago" for both.

Please visit https://docs.datadoghq.com/api/latest/service-level-objectives/#get-an-slos-history for more information on the response payload, which is returned as a dictionary.

Signature:

def get_slo(slo_id: str,
            start_period: str = '2 minutes ago',
            end_period: str = None,
            configuration: Dict[str, Dict[str, str]] = None,
            secrets: Dict[str, Dict[str, str]] = None) -> Dict[str, Any]:
    pass

Arguments:

Name Type Default Required
slo_id string Yes
start_period string “2 minutes ago” No
end_period string null No

Usage:

{
  "name": "get-slo",
  "type": "probe",
  "provider": {
    "type": "python",
    "module": "chaosdatadog.slo.probes",
    "func": "get_slo",
    "arguments": {
      "slo_id": ""
    }
  }
}
name: get-slo
provider:
  arguments:
    slo_id: ''
  func: get_slo
  module: chaosdatadog.slo.probes
  type: python
type: probe

get_slo_details

Type probe
Module chaosdatadog.slo.probes
Name get_slo_details
Return mapping

Get a SLO’s details.

Please visit https://docs.datadoghq.com/api/latest/service-level-objectives/#get-an-slos-details for more information on the response payload, which is returned as a dictionary.

Signature:

def get_slo_details(
        slo_id: str,
        configuration: Dict[str, Dict[str, str]] = None,
        secrets: Dict[str, Dict[str, str]] = None) -> Dict[str, Any]:
    pass

Arguments:

Name Type Default Required
slo_id string Yes

Usage:

{
  "name": "get-slo-details",
  "type": "probe",
  "provider": {
    "type": "python",
    "module": "chaosdatadog.slo.probes",
    "func": "get_slo_details",
    "arguments": {
      "slo_id": ""
    }
  }
}
name: get-slo-details
provider:
  arguments:
    slo_id: ''
  func: get_slo_details
  module: chaosdatadog.slo.probes
  type: python
type: probe

slo_must_be_met

Type tolerance
Module chaosdatadog.slo.tolerances
Name slo_must_be_met
Return boolean

Checks that the current SLI value of a SLO is higher than its target for a given threshold period ("7d", "30d", "90d", "custom").

Signature:

def slo_must_be_met(threshold: str = '7d',
                    value: Dict[str, Any] = None) -> bool:
    pass

Arguments:

Name Type Default Required
threshold string “7d” No
value mapping null No

Tolerances declare the value argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.

Usage:

{
  "steady-state-hypothesis": {
    "title": "...",
    "probes": [
      {
        "type": "probe",
        "tolerance": {
          "name": "slo-must-be-met",
          "type": "tolerance",
          "provider": {
            "type": "python",
            "module": "chaosdatadog.slo.tolerances",
            "func": "slo_must_be_met"
          }
        },
        "...": "..."
      }
    ]
  }
}
steady-state-hypothesis:
  probes:
  - '...': '...'
    tolerance:
      name: slo-must-be-met
      provider:
        func: slo_must_be_met
        module: chaosdatadog.slo.tolerances
        type: python
      type: tolerance
    type: probe
  title: '...'