# ADR 013: Observability
# Changelog
- 20-01-2020: Initial Draft
# Status
Proposed
# Context
Telemetry is paramount into debugging and understanding what the application is doing and how it is performing. We aim to expose metrics from modules and other core parts of the Cosmos SDK.
In addition, we should aim to support multiple configurable sinks that an operator may choose from. By default, when telemetry is enabled, the application should track and expose metrics that are stored in-memory. The operator may choose to enable additional sinks, where we support only Prometheus (opens new window) for now, as it's battle-tested, simple to setup, open source, and is rich with ecosystem tooling.
We must also aim to integrate metrics into the Cosmos SDK in the most seamless way possible such that metrics may be added or removed at will and without much friction. To do this, we will use the go-metrics (opens new window) library.
Finally, operators may enable telemetry along with specific configuration options. If enabled, metrics
will be exposed via /metrics?format={text|prometheus}
via the API server.
# Decision
We will add an additional configuration block to app.toml
that defines telemetry settings:
The given configuration allows for two sinks -- in-memory and Prometheus. We create a Metrics
type that performs all the bootstrapping for the operator, so capturing metrics becomes seamless.
In addition, Metrics
allows us to gather the current set of metrics at any given point in time. An
operator may also choose to send a signal, SIGUSR1, to dump and print formatted metrics to STDERR.
During an application's bootstrapping and construction phase, if Telemetry.Enabled
is true
, the
API server will create an instance of a reference to Metrics
object and will register a metrics
handler accordingly.
Application developers may track counters, gauges, summaries, and key/value metrics. There is no additional lifting required by modules to leverage profiling metrics. To do so, it's as simple as:
# Consequences
# Positive
- Exposure into the performance and behavior of an application