Extension chaosdatadog
¶
Version | 0.3.1 |
Repository | https://github.com/chaostoolkit-incubator/chaostoolkit-datadog |
This project contains Chaos Toolkit activities and tolerances to work with DataDog.
Install¶
This package requires Python 3.8+
To be used from your experiment, this package must be installed in the Python environment where chaostoolkit already lives.
$ pip install chaostoolkit-datadog
Usage¶
A typical experiment using this extension would look like this:
{
"version": "1.0.0",
"title": "Run a, experiment using a DataDog SLO to verify our system",
"description": "n/a",
"configuration": {
"datadog_host": "https://datadoghq.eu"
},
"steady-state-hypothesis": {
"title": "n/a",
"probes": [
{
"type": "probe",
"name": "read-slo",
"tolerance": {
"type": "probe",
"name": "check-slo",
"provider": {
"type": "python",
"module": "chaosdatadog.slo.tolerances",
"func": "slo_must_be_met",
"arguments": {
"threshold": "7d"
}
}
},
"provider": {
"type": "python",
"module": "chaosdatadog.slo.probes",
"func": "get_slo",
"arguments": {
"slo_id": "..."
}
}
}
]
},
"method": []
}
That’s it!
Please explore the code to see existing probes and actions.
Configuration¶
In the configuration
block you may want to specify the DataDog host you are targetting:
"configuration": {
"datadog_host": "https://datadoghq.eu"
},
The authentication can be set using the typical DataDog environment variables, notably:
DD_API_KEY
: the API keyDD_APP_KEY
: the application key
Test¶
To run the tests for the project execute the following:
$ pdm run test
Formatting and Linting¶
We use ruff
to both lint and format this repositories code.
Before raising a Pull Request, we recommend you run formatting against your code with:
$ pdm run format
This will automatically format any code that doesn’t adhere to the formatting standards.
As some things are not picked up by the formatting, we also recommend you run:
$ pdm run lint
To ensure that any unused import statements/strings that are too long, etc. are also picked up.
Contribute¶
If you wish to contribute more functions to this package, you are more than welcome to do so. Please, fork this project, make your changes following the usual PEP 8 code style, sprinkling with tests and submit a PR for review.
Exported Activities¶
metrics¶
get_metrics_state
¶
Type | probe |
Module | chaosdatadog.metrics.probes |
Name | get_metrics_state |
Return | boolean |
The next function is to:
- Query metrics from any time period (timeseries and scalar)
- Compare the metrics to some treshold in some time. Ex.(CPU, Memory, Network)
- Check is the sum of datapoins is over some value. Ex. (requests, errors, custom metrics)
you can use a comparison to check if all data points in the query satisfy the steady state condition
Ex. cumsum(sum:istio.mesh.request.count.total{kube_service:test, response_code:500}.as_count())
the above query is a cumulative sum of all requests with response code of 500. if you want your request in a window of time you have a deviant hypothesis if you have more than 30 http_500 errors the comparison should be <. so any value below 30 is a steady state.
the allowed comparison values are [“>”, “<”, “>=”, “<=”, “==”]
Signature:
def get_metrics_state(query: str,
comparison: str,
threshold: float,
minutes_before: int,
configuration: Dict[str, Dict[str, str]] = None,
secrets: Dict[str, Dict[str, str]] = None) -> bool:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
query | string | Yes | |
comparison | string | Yes | |
threshold | number | Yes | |
minutes_before | integer | Yes |
Usage:
{
"name": "get-metrics-state",
"type": "probe",
"provider": {
"type": "python",
"module": "chaosdatadog.metrics.probes",
"func": "get_metrics_state",
"arguments": {
"query": "",
"comparison": "",
"threshold": null,
"minutes_before": 0
}
}
}
name: get-metrics-state
provider:
arguments:
comparison: ''
minutes_before: 0
query: ''
threshold: null
func: get_metrics_state
module: chaosdatadog.metrics.probes
type: python
type: probe
slo¶
get_slo
¶
Type | probe |
Module | chaosdatadog.slo.probes |
Name | get_slo |
Return | mapping |
Get a SLO’s history for the given period.
Periods should be given relative to each other. If end_period
isn’t provided it will resolve to now (UTC). start_period
is always relative to end_period
. You can use a format such as: "X minutes ago"
for both.
Please visit https://docs.datadoghq.com/api/latest/service-level-objectives/#get-an-slos-history for more information on the response payload, which is returned as a dictionary.
Signature:
def get_slo(slo_id: str,
start_period: str = '2 minutes ago',
end_period: str = None,
configuration: Dict[str, Dict[str, str]] = None,
secrets: Dict[str, Dict[str, str]] = None) -> Dict[str, Any]:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
slo_id | string | Yes | |
start_period | string | “2 minutes ago” | No |
end_period | string | null | No |
Usage:
{
"name": "get-slo",
"type": "probe",
"provider": {
"type": "python",
"module": "chaosdatadog.slo.probes",
"func": "get_slo",
"arguments": {
"slo_id": ""
}
}
}
name: get-slo
provider:
arguments:
slo_id: ''
func: get_slo
module: chaosdatadog.slo.probes
type: python
type: probe
get_slo_details
¶
Type | probe |
Module | chaosdatadog.slo.probes |
Name | get_slo_details |
Return | mapping |
Get a SLO’s details.
Please visit https://docs.datadoghq.com/api/latest/service-level-objectives/#get-an-slos-details for more information on the response payload, which is returned as a dictionary.
Signature:
def get_slo_details(
slo_id: str,
configuration: Dict[str, Dict[str, str]] = None,
secrets: Dict[str, Dict[str, str]] = None) -> Dict[str, Any]:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
slo_id | string | Yes |
Usage:
{
"name": "get-slo-details",
"type": "probe",
"provider": {
"type": "python",
"module": "chaosdatadog.slo.probes",
"func": "get_slo_details",
"arguments": {
"slo_id": ""
}
}
}
name: get-slo-details
provider:
arguments:
slo_id: ''
func: get_slo_details
module: chaosdatadog.slo.probes
type: python
type: probe
slo_must_be_met
¶
Type | tolerance |
Module | chaosdatadog.slo.tolerances |
Name | slo_must_be_met |
Return | boolean |
Checks that the current SLI value of a SLO is higher than its target for a given threshold period ("7d"
, "30d"
, "90d"
, "custom"
).
Signature:
def slo_must_be_met(threshold: str = '7d',
value: Dict[str, Any] = None) -> bool:
pass
Arguments:
Name | Type | Default | Required |
---|---|---|---|
threshold | string | “7d” | No |
value | mapping | null | No |
Tolerances declare the value
argument which is automatically injected by Chaos Toolkit as the output of the probe they are evaluating.
Usage:
{
"steady-state-hypothesis": {
"title": "...",
"probes": [
{
"type": "probe",
"tolerance": {
"name": "slo-must-be-met",
"type": "tolerance",
"provider": {
"type": "python",
"module": "chaosdatadog.slo.tolerances",
"func": "slo_must_be_met"
}
},
"...": "..."
}
]
}
}
steady-state-hypothesis:
probes:
- '...': '...'
tolerance:
name: slo-must-be-met
provider:
func: slo_must_be_met
module: chaosdatadog.slo.tolerances
type: python
type: tolerance
type: probe
title: '...'