gcp.vertex-ai-batch-prediction-job

GCP Vertex AI Batch Prediction Job Resource

Vertex AI Batch Prediction Jobs are used to run batch inference workloads on machine learning models at scale.

example:

List all Batch Prediction Jobs in specific locations:

policies:
  - name: vertexai-batch-jobs-inventory
    resource: gcp.vertex-ai-batch-prediction-job
    query:
      - location: us-central1
      - location: us-east1

example:

Find long-running batch prediction jobs:

policies:
  - name: vertexai-batch-jobs-long-running
    resource: gcp.vertex-ai-batch-prediction-job
    filters:
      - type: value
        key: state
        value: JOB_STATE_RUNNING
      - type: value
        key: createTime
        value_type: age
        op: greater-than
        value: 24

example:

Find failed batch prediction jobs:

policies:
  - name: vertexai-batch-jobs-failed
    resource: gcp.vertex-ai-batch-prediction-job
    filters:
      - type: value
        key: state
        value: JOB_STATE_FAILED

Filters

event

list-item

metrics

reduce

scc-findings

value

metrics

Supports metrics filters on resources.

All resources that have cloud watch metrics are supported.

Docs on cloud watch metrics

Google Supported Metrics https://cloud.google.com/monitoring/api/metrics_gcp
Custom Metrics https://cloud.google.com/monitoring/api/v3/metric-model#intro-custom-metrics

- name: firewall-hit-count
  resource: gcp.firewall
  filters:
    - type: metrics
      name: firewallinsights.googleapis.com/subnet/firewall_hit_count
      aligner: ALIGN_COUNT
      days: 14
      value: 1
      op: greater-than

The period-start key allows you to align the metric window in two ways. By default, using auto, the window is computed relative to the current time. Alternatively, setting it to start-of-day aligns the window to full UTC calendar days, beginning at 00:00:00 UTC and ending at current day 00:00:00 UTC.

- name: instance-low-cpu-last-full-day
  resource: gcp.instance
  filters:
    - type: metrics
      name: compute.googleapis.com/instance/cpu/utilization
      aligner: ALIGN_MEAN
      days: 1
      value: 0.05
      op: less-than
      period-start: start-of-day

properties:
  aligner:
    enum:
    - ALIGN_NONE
    - ALIGN_DELTA
    - ALIGN_RATE
    - ALIGN_INTERPOLATE
    - ALIGN_MIN
    - ALIGN_MAX
    - ALIGN_MEAN
    - ALIGN_COUNT
    - ALIGN_SUM
    - REDUCE_COUNT_FALSE
    - ALIGN_STDDEV
    - ALIGN_COUNT_TRUE
    - ALIGN_COUNT_FALSE
    - ALIGN_FRACTION_TRUE
    - ALIGN_PERCENTILE_99
    - ALIGN_PERCENTILE_95
    - ALIGN_PERCENTILE_50
    - ALIGN_PERCENTILE_05
    - ALIGN_PERCENT_CHANG
    type: string
  days:
    type: number
  filter:
    type: string
  group-by-fields:
    items:
      type: string
    type: array
  metric-key:
    type: string
  missing-value:
    type: number
  name:
    type: string
  op:
    enum:
    - eq
    - equal
    - ne
    - not-equal
    - gt
    - greater-than
    - ge
    - gte
    - le
    - lte
    - lt
    - less-than
    - glob
    - regex
    - regex-case
    - in
    - ni
    - not-in
    - contains
    - difference
    - intersect
    - mod
    type: string
  period-start:
    enum:
    - auto
    - start-of-day
    type: string
  reducer:
    enum:
    - REDUCE_NONE
    - REDUCE_MEAN
    - REDUCE_MIN
    - REDUCE_MAX
    - REDUCE_MEAN
    - REDUCE_SUM
    - REDUCE_STDDEV
    - REDUCE_COUNT
    - REDUCE_COUNT_TRUE
    - REDUCE_COUNT_FALSE
    - REDUCE_FRACTION_TRUE
    - REDUCE_PERCENTILE_99
    - REDUCE_PERCENTILE_95
    - REDUCE_PERCENTILE_50
    - REDUCE_PERCENTILE_05
    type: string
  type:
    enum:
    - metrics
  value:
    type: number
required:
- value
- name
- op

Permissions - monitoring.timeSeries.list

Actions

delete

notify

stop

webhook

delete

Delete Vertex AI Batch Prediction Jobs

Deletes a Vertex AI Batch Prediction Job. Note that this is an asynchronous operation that returns a long-running operation. The job will be deleted in the background.

Warning: This permanently deletes the batch prediction job and its metadata. Job results in Cloud Storage are not affected.

example:

Delete failed batch prediction jobs:

policies:
  - name: delete-failed-batch-jobs
    resource: gcp.vertex-ai-batch-prediction-job
    filters:
      - type: value
        key: state
        value: JOB_STATE_FAILED
    actions:
      - type: delete

https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs/delete

properties:
  type:
    enum:
    - delete
required:
- type

Permissions - aiplatform.batchPredictionJobs.delete

stop

Stop (Cancel) Vertex AI Batch Prediction Jobs

Cancels a running Vertex AI Batch Prediction Job. This is useful for cost control and incident response when jobs are running longer than expected or consuming unexpected resources.

Note: Only jobs in JOB_STATE_RUNNING or JOB_STATE_PENDING can be cancelled. Completed, failed, or already cancelled jobs cannot be cancelled.

example:

Cancel long-running batch prediction jobs:

policies:
  - name: cancel-long-running-batch-jobs
    resource: gcp.vertex-ai-batch-prediction-job
    filters:
      - type: value
        key: state
        value: JOB_STATE_RUNNING
      - type: value
        key: createTime
        value_type: age
        op: greater-than
        value: 24
    actions:
      - type: stop

example:

Cancel all running batch jobs (emergency cost control):

policies:
  - name: emergency-cancel-all-batch-jobs
    resource: gcp.vertex-ai-batch-prediction-job
    filters:
      - type: value
        key: state
        value: JOB_STATE_RUNNING
    actions:
      - type: stop

https://cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.batchPredictionJobs/cancel

properties:
  type:
    enum:
    - stop
required:
- type

Permissions - aiplatform.batchPredictionJobs.cancel