Monitoring Jobs with DataDog
This guide will give you everything you need to start observing metrics for Immerok Jobs with DataDog.
If you haven't created a Job already, check out the tutorial to get going.
Interested in other ways to consume metrics from your Jobs? Hitting some limitations for your use case? Reach out to us on Slack, email, wherever you find your local Immerokers.
Configuring DataDog
The OpenMetrics check allows the DataDog agent to scrape metrics from Immerok Cloud's Prometheus scrape endpoints using the Prometheus text format.
We'll assume our Org is named immerok
and we have a Job named window-aggregation-0.1
in the default
Project.
Each Job exposes all its metrics for scraping via the Immerok API Server on the endpoint:
https://api.immerok.cloud/apis/core/v1alpha1/orgs/$ORG/projects/$PROJECT/jobs/$JOB/metrics
We'll just need the Org, Project, and Job names as well as a token to authenticate.
Let's use the rok
CLI to generate one and then write it to a file ./auth-token.txt
.
_5$ rok auth generate > auth-token.txt_5$ cat auth-token.txt_5eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhdXN0aW5jZSIsImV4cCI6MTY2ODQ3NjIwOCwibmJmIjoxNjY3ODcxNDA4LCJpYXQiOjE2Njc4NzE0MDgsInJvbGUiOiJvcmc6YWRtaW4iLCJvcmdzIjpbImltbWVyb2siXX0.U8j_V1K-hYIy0WnRjpmpVbBiooWLWqM_V7TXrYdLBRGd8Od8YUpeQr94QeAmkgQfCSQ_c6FpIt1G3FAzsBPOgQ_5# Optionally, specify an expiration time_5$ rok auth generate --expires-at="3mo"
We'll ingest all Flink metrics in this example, but it is recommended
to allow-list only those you care about (via the metrics
key) to avoid sending too
many custom metrics to DataDog,
or using Metrics without Limits™ to filter
them before they are indexed.
We do not expose deprecated metrics. Please consult Flink's documentation for replacements.
The below configuration uses DataDog agent version >= 6.
_23# Full config can be found at: https://github.com/DataDog/integrations-core/blob/6a56176677511afe91b4dc9b3f675946d9770a0f/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example_23instances:_23 -_23 ## The URL exposing metrics in the OpenMetrics format._23 openmetrics_endpoint: https://api.immerok.cloud/apis/core/v1alpha1/orgs/immerok/projects/default/jobs/window-aggregation-0.1/metrics_23_23 ## Authorization, reading the above-fetched token from the local file._23 auth_token:_23 reader:_23 type: file_23 path: /path/to/auth-token.txt_23 writer:_23 type: header_23 name: Authorization_23 value: "Bearer <TOKEN>"_23 placeholder: <TOKEN>_23_23 ## The namespace to be prepended to all metrics._23 namespace: immerok_23_23 ## The metrics to forward to DataDog._23 metrics:_23 - .*
Immerok-Specific Labels
The following labels are attached to each metric to help identify workloads:
immerok_org
: The Org the Job belongs toimmerok_project
: The Project the Job resides inimmerok_zone
: The Zone the Job runs inimmerok_job
: The name of the Job itself
Observing the Metrics
Once ingested, you can use the rich set of metrics made available by Flink (docs here).
In our dashboard provided-below, we sketch out possibilities to get you started exploring health, throughput, and state checkpointing.