Skip to main content

Monitoring your Flink job

Monitoring

Metrics

Immerok Jobs expose metrics in Prometheus text format. Check out the tutorials on integrating with Prometheus and DataDog.

caution

We do not expose deprecated metrics. Please consult Flink's documentation for replacements.

Events

Once a resource has been created or marked for modification, you can monitor the transition to the desired state in the cluster via the collected Events.


_11
$ rok get events
_11
_11
LAST SEEN TYPE REASON OBJECT MESSAGE
_11
38s Normal FlinkJobRunning job/my-job Successfully transitioned the Flink job to RUNNING.
_11
51s Normal FlinkJobCreated job/my-job Successfully created the Flink job.
_11
66s Normal FlinkTaskManagersReady job/my-job Flink TaskManagers ready: 2/2.
_11
73s Normal FlinkTaskManagersStarting job/my-job Job is starting 2 Flink TaskManagers.
_11
73s Normal FlinkJobManagerStarting job/my-job Job is starting a Flink JobManager.
_11
73s Normal ArtifactResolved job/my-job Job has successfully resolved the referenced Artifact my-job-1048686d.
_11
78s Warning ArtifactNotReady job/my-job Job cannot be started as the referenced Artifact my-job-1048686d is not Ready.
_11
78s Normal JobScheduledInZone job/my-job Job scheduled in zone shared-aws-eu-west-1.

Note

Usually, you don't need to list all Events via rok get events. By default, the rok CLI shows the 10 most recent Events that correspond to the given resource in rok describe.

For example, rok describe job my-job will show the most recent Events of this Job at the bottom.

Details

If more information is necessary for a certain Event, show the Event's name in the last column using -o wide.


_4
$ rok get events -o wide
_4
_4
LAST SEEN TYPE REASON OBJECT MESSAGE FIRST SEEN COUNT NAME
_4
46s Warning JobExceptionOccurred job/my-job Exception during execution: org.apache.flink.util.FlinkException 46s my-job.co6wd72kcyhp

You can then use the name in a describe operation.


_20
$ rok describe event my-job.co6wd72kcyhp
_20
_20
Name: my-job.co6wd72kcyhp
_20
Labels: <none>
_20
Annotations: <none>
_20
_20
Type: Warning
_20
Reason: JobExceptionOccurred
_20
Message: Exception during execution: org.apache.flink.util.FlinkException
_20
First Seen: 2022-11-08T12:01:58.052504Z (2022-11-08 13:01:58.052504 +0100 CET)
_20
_20
Regarding Object:
_20
Kind: Job
_20
Name: my-job
_20
_20
Debug Information:
_20
org.apache.flink.util.FlinkException: Could not load the provided entrypoint class.
_20
at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:215)
_20
at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:103)
_20
[...]

Especially, the Debug Information section might be useful to access JVM stack traces for example.

Field Selectors

During a resource's lifecycle, a high amount of Events might be created. Field selectors enable further filtering for interesting information.


_1
$ rok get events --field-selector="regarding.kind=Job,regarding.name=my-job,type=Warning"

The selector above filters for Events that relate to a Job resource named my-job of type Warning.

Currently, the following selector fields are exposed for Events next to regular metadata:

  • type
  • reason
  • reportingController
  • regarding.kind
  • regarding.project
  • regarding.name
  • regarding.uid
  • regarding.apiVersion
  • regarding.resourceVersion

You can monitor Jobs via the Apache Flink Web UI. The rok CLI can open it in the Immerok Cloud UI for you:


_1
$ rok flinkui job my-job