Monitoring your Flink job
Monitoring
Metrics
Immerok Jobs expose metrics in Prometheus text format. Check out the tutorials on integrating with Prometheus and DataDog.
We do not expose deprecated metrics. Please consult Flink's documentation for replacements.
Events
Once a resource has been created or marked for modification, you can monitor the transition to the desired state in the cluster via the collected Events.
_11$ rok get events_11_11LAST SEEN TYPE REASON OBJECT MESSAGE_1138s Normal FlinkJobRunning job/my-job Successfully transitioned the Flink job to RUNNING._1151s Normal FlinkJobCreated job/my-job Successfully created the Flink job._1166s Normal FlinkTaskManagersReady job/my-job Flink TaskManagers ready: 2/2._1173s Normal FlinkTaskManagersStarting job/my-job Job is starting 2 Flink TaskManagers._1173s Normal FlinkJobManagerStarting job/my-job Job is starting a Flink JobManager._1173s Normal ArtifactResolved job/my-job Job has successfully resolved the referenced Artifact my-job-1048686d._1178s Warning ArtifactNotReady job/my-job Job cannot be started as the referenced Artifact my-job-1048686d is not Ready._1178s Normal JobScheduledInZone job/my-job Job scheduled in zone shared-aws-eu-west-1.
Usually, you don't need to list all Events via rok get events
. By default, the rok
CLI shows the 10 most recent
Events that correspond to the given resource in rok describe
.
For example, rok describe job my-job
will show the most recent Events of this Job at the bottom.
Details
If more information is necessary for a certain Event, show the Event's name in the last column using -o wide
.
_4$ rok get events -o wide_4_4LAST SEEN TYPE REASON OBJECT MESSAGE FIRST SEEN COUNT NAME_446s Warning JobExceptionOccurred job/my-job Exception during execution: org.apache.flink.util.FlinkException 46s my-job.co6wd72kcyhp
You can then use the name in a describe
operation.
_20$ rok describe event my-job.co6wd72kcyhp_20_20Name: my-job.co6wd72kcyhp_20Labels: <none>_20Annotations: <none>_20_20Type: Warning_20Reason: JobExceptionOccurred_20Message: Exception during execution: org.apache.flink.util.FlinkException_20First Seen: 2022-11-08T12:01:58.052504Z (2022-11-08 13:01:58.052504 +0100 CET)_20_20Regarding Object:_20 Kind: Job_20 Name: my-job_20_20Debug Information:_20org.apache.flink.util.FlinkException: Could not load the provided entrypoint class._20 at org.apache.flink.client.program.DefaultPackagedProgramRetriever.getPackagedProgram(DefaultPackagedProgramRetriever.java:215)_20 at org.apache.flink.container.entrypoint.StandaloneApplicationClusterEntryPoint.getPackagedProgram(StandaloneApplicationClusterEntryPoint.java:103)_20[...]
Especially, the Debug Information section might be useful to access JVM stack traces for example.
Field Selectors
During a resource's lifecycle, a high amount of Events might be created. Field selectors enable further filtering for interesting information.
_1$ rok get events --field-selector="regarding.kind=Job,regarding.name=my-job,type=Warning"
The selector above filters for Events that relate to a Job
resource named my-job
of type Warning
.
Currently, the following selector fields are exposed for Events next to regular metadata:
type
reason
reportingController
regarding.kind
regarding.project
regarding.name
regarding.uid
regarding.apiVersion
regarding.resourceVersion
Flink Web UI
You can monitor Jobs via the Apache Flink Web UI. The rok CLI
can open it in the Immerok Cloud UI for you:
_1$ rok flinkui job my-job