Skip to main content

Monitoring Jobs with Prometheus and Grafana

This guide will give you everything you need to start observing metrics for Immerok Jobs with Prometheus and Grafana.

If you haven't created a Job already, check out the tutorial to get going.

info

Interested in other ways to consume metrics from your Jobs? Hitting some limitations for your use case? Reach out to us on Slack, email, wherever you find your local Immerokers.

Configuring Prometheus

We'll assume our Org is named immerok and we have a Job named window-aggregation-0.1 in the default Project. Each Job exposes all its metrics for scraping via the Immerok API Server on the endpoint: https://api.immerok.cloud/apis/core/v1alpha1/orgs/$ORG/projects/$PROJECT/jobs/$JOB/metrics

We'll use the static_config directive to specify the scrape endpoint.

We'll just need the Org, Project, and Job names as well as a token to authenticate.

Let's use the rok CLI to generate one:


_4
$ rok auth generate
_4
eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiJhdXN0aW5jZSIsImV4cCI6MTY2ODQ3NjIwOCwibmJmIjoxNjY3ODcxNDA4LCJpYXQiOjE2Njc4NzE0MDgsInJvbGUiOiJvcmc6YWRtaW4iLCJvcmdzIjpbImltbWVyb2siXX0.U8j_V1K-hYIy0WnRjpmpVbBiooWLWqM_V7TXrYdLBRGd8Od8YUpeQr94QeAmkgQfCSQ_c6FpIt1G3FAzsBPOgQ
_4
# Optionally, specify an expiration time
_4
$ rok auth generate --expires-at="3mo"


_15
global:
_15
evaluation_interval: 1m
_15
scrape_interval: 10s
_15
scrape_timeout: 10s
_15
scrape_configs:
_15
- honor_labels: true
_15
job_name: 'immerok-job'
_15
scheme: https
_15
metrics_path: /apis/core/v1alpha1/orgs/immerok/projects/default/jobs/window-aggregation-0.1/metrics
_15
authorization:
_15
# Outside local testing, you should load the token via a file instead of leaving it in plaintext within the config
_15
credentials: $TOKEN
_15
static_configs:
_15
- targets:
_15
- api.immerok.cloud

Immerok-Specific Labels

The following labels are attached to each metric to help identify workloads:

  • immerok_org: The Org the Job belongs to
  • immerok_project: The Project the Job resides in
  • immerok_zone: The Zone the Job runs in
  • immerok_job: The name of the Job itself

Observing the Metrics

Once ingested, you can use the rich set of metrics made available by Flink (docs here).

caution

We do not expose deprecated metrics. Please consult Flink's documentation for replacements.

In our dashboard provided-below, we sketch out possibilities to get you started exploring health, throughput, and state checkpointing.

grafana job dashboard

Grafana Dashboards