Skip to main content

Monitoring Jobs with DataDog

This guide will give you everything you need to start observing metrics for Immerok Jobs with DataDog.

If you haven't created a Job already, check out the tutorial to get going.

info

Interested in other ways to consume metrics from your Jobs? Hitting some limitations for your use case? Reach out to us on Slack, email, wherever you find your local Immerokers.

Configuring DataDog

The OpenMetrics check allows the DataDog agent to scrape metrics from Immerok Cloud's Prometheus scrape endpoints using the Prometheus text format.

We'll assume our Org is named immerok and we have a Job named window-aggregation-0.1 in the default Project. Each Job exposes all its metrics for scraping via the Immerok API Server on the endpoint: https://api.immerok.cloud/apis/core/v1alpha1/orgs/$ORG/projects/$PROJECT/jobs/$JOB/metrics

We'll just need the Org, Project, and Job names as well as a token to authenticate.

Let's use the rok CLI to generate one and then write it to a file ./auth-token.txt.


_5
$ rok auth generate > auth-token.txt
_5
$ cat auth-token.txt
_5
eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhdXN0aW5jZSIsImV4cCI6MTY2ODQ3NjIwOCwibmJmIjoxNjY3ODcxNDA4LCJpYXQiOjE2Njc4NzE0MDgsInJvbGUiOiJvcmc6YWRtaW4iLCJvcmdzIjpbImltbWVyb2siXX0.U8j_V1K-hYIy0WnRjpmpVbBiooWLWqM_V7TXrYdLBRGd8Od8YUpeQr94QeAmkgQfCSQ_c6FpIt1G3FAzsBPOgQ
_5
# Optionally, specify an expiration time
_5
$ rok auth generate --expires-at="3mo"

We'll ingest all Flink metrics in this example, but it is recommended to allow-list only those you care about (via the metrics key) to avoid sending too many custom metrics to DataDog, or using Metrics without Limits™ to filter them before they are indexed.

caution

We do not expose deprecated metrics. Please consult Flink's documentation for replacements.

caution

The below configuration uses DataDog agent version >= 6.


_23
# Full config can be found at: https://github.com/DataDog/integrations-core/blob/6a56176677511afe91b4dc9b3f675946d9770a0f/openmetrics/datadog_checks/openmetrics/data/conf.yaml.example
_23
instances:
_23
-
_23
## The URL exposing metrics in the OpenMetrics format.
_23
openmetrics_endpoint: https://api.immerok.cloud/apis/core/v1alpha1/orgs/immerok/projects/default/jobs/window-aggregation-0.1/metrics
_23
_23
## Authorization, reading the above-fetched token from the local file.
_23
auth_token:
_23
reader:
_23
type: file
_23
path: /path/to/auth-token.txt
_23
writer:
_23
type: header
_23
name: Authorization
_23
value: "Bearer <TOKEN>"
_23
placeholder: <TOKEN>
_23
_23
## The namespace to be prepended to all metrics.
_23
namespace: immerok
_23
_23
## The metrics to forward to DataDog.
_23
metrics:
_23
- .*

Immerok-Specific Labels

The following labels are attached to each metric to help identify workloads:

  • immerok_org: The Org the Job belongs to
  • immerok_project: The Project the Job resides in
  • immerok_zone: The Zone the Job runs in
  • immerok_job: The name of the Job itself

Observing the Metrics

Once ingested, you can use the rich set of metrics made available by Flink (docs here).

In our dashboard provided-below, we sketch out possibilities to get you started exploring health, throughput, and state checkpointing.

datadog job dashboard

DataDog Dashboards