Dataflow - Check for Hanged Jobs
Once started, a job in the Cloud Dataflow service transits from state to state and normally enters a terminal state. Custodian can check if there are any jobs hanging in temporary statuses abnormally long.
Note that the notify
action requires a Pub/Sub topic to be configured. To configure Cloud Pub/Sub messaging please take a look at the Generic Actions page.
In the example below, the policy checks if there are any jobs which started over 1 day ago (configurable period) but not yet transitioned to a certain stable state for some reason (remains in JOB_STATE_RUNNING
, JOB_STATE_DRAINING
, JOB_STATE_CANCELLING
statuses) and therefore may need administrator’s attention.
policies:
- name: gcp-dataflow-jobs-update
resource: gcp.dataflow-job
filters:
- type: value
key: startTime
op: greater-than
value_type: age
value: 1
- type: value
key: currentState
value: [JOB_STATE_RUNNING, JOB_STATE_DRAINING, JOB_STATE_CANCELLING]
actions:
- type: notify
to:
- email@address
format: json
transport:
type: pubsub
topic: projects/cloud-custodian/topics/dataflow