Google CloudGCP-PMLE72 concepts

GCP-PMLE Cheat Sheet

Quick reference for the Google Professional Machine Learning Engineer exam.

Quick Navigation

Vertex AI Core Platform Vertex AI Training — gcloud CLI BigQuery ML AutoML and Pre-built APIs Data Processing and Ingestion Distributed Training and Hyperparameter Tuning Vertex AI Pipelines and MLOps Serving and Deployment Patterns Monitoring, Explainability, and Responsible AI IAM Roles for ML Architecture Decision Patterns — Quick Rules

Vertex AI Core Platform

Vertex AI: Unified ML platform combining AutoML, custom training, pipelines, feature store, and endpoints under one API and SDK.
Vertex AI Workbench: Managed JupyterLab environment; managed notebooks auto-idle for cost savings while user-managed notebooks give full VM control.
Vertex AI Model Registry: Central repository for versioning model artifacts, metadata, and evaluation metrics before deployment to an endpoint.
Vertex AI Experiments: Tracks and compares metrics, parameters, and artifacts across training runs; not used for production A/B traffic testing.
Vertex AI Feature Store: Centralized feature repository with low-latency online serving and point-in-time offline serving to prevent training-serving skew.
Vertex AI Vizier: Black-box hyperparameter optimization service using Bayesian optimization; can tune any system, not only Vertex AI training jobs.
Vertex AI Metadata / ML Metadata (MLMD): Automatically records lineage of datasets, models, and pipeline executions for artifact provenance and reproducibility.

Vertex AI Training — gcloud CLI

gcloud ai custom-jobs create --region=REGION --display-name=NAME --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=IMAGE_URI: Launches a custom training job with a specified container image, machine type, and worker pool configuration.
gcloud ai models upload --region=REGION --display-name=NAME --container-image-uri=URI --artifact-uri=gs://BUCKET/model: Registers a trained model artifact from Cloud Storage into the Model Registry for later deployment.
gcloud ai hp-tuning-jobs create --region=REGION --config=study_config.yaml --display-name=NAME: Submits a Vertex AI Vizier hyperparameter tuning job defined by a YAML study configuration.
gcloud ai endpoints create --region=REGION --display-name=NAME: Creates an empty prediction endpoint that one or more models can later be deployed to.
gcloud ai endpoints deploy-model ENDPOINT_ID --region=REGION --model=MODEL_ID --machine-type=n1-standard-4 --min-replica-count=1 --max-replica-count=5: Deploys a registered model to an endpoint with defined autoscaling replica bounds.
gcloud ai custom-jobs stream-logs JOB_ID --region=REGION: Streams live training logs from a running custom job for real-time debugging.

BigQuery ML

CREATE OR REPLACE MODEL `dataset.model_name` OPTIONS(model_type='logistic_reg', input_label_cols=['label']) AS SELECT * FROM `dataset.training_table`;: Trains a model directly on BigQuery data using standard SQL with no data export required.
SELECT * FROM ML.PREDICT(MODEL `dataset.model_name`, TABLE `dataset.new_data`): Generates predictions on new rows using a trained BigQuery ML model.
SELECT * FROM ML.EVALUATE(MODEL `dataset.model_name`): Returns evaluation metrics such as precision, recall, and ROC AUC for a trained BigQuery ML model.
SELECT * FROM ML.EXPLAIN_PREDICT(MODEL `dataset.model_name`, TABLE `dataset.new_data`, STRUCT(3 AS top_k_features)): Returns Shapley-value feature attributions explaining each individual prediction.
BQML model_type options: linear_reg, logistic_reg, kmeans, matrix_factorization, boosted_tree_classifier/regressor, dnn_classifier/regressor, arima_plus, autoencoder, pca — pick per business problem.
bq query --use_legacy_sql=false "SELECT * FROM ML.PREDICT(MODEL \`ds.m\`, TABLE \`ds.t\`)": Runs a BigQuery ML SQL statement from the command line for scripting and automation.
TRANSFORM() clause in CREATE MODEL: Bakes feature preprocessing into the model definition so the same transforms apply automatically at both training and prediction time.

AutoML and Pre-built APIs

AutoML Tabular / Image / Text / Video: No-code training with automated feature engineering, architecture search, and hyperparameter tuning; requires your own labeled data.
Vision AI: Pre-trained API for label detection, OCR, face detection, and explicit content moderation with zero training data required.
Natural Language AI: Pre-trained API for entity extraction, sentiment analysis, and syntax parsing on unstructured text.
Translation AI: Pre-trained and custom-glossary API for real-time and batch text translation across 100+ languages.
Speech-to-Text / Text-to-Speech: Pre-trained APIs for streaming and batch audio transcription and natural-sounding speech synthesis.
Data Labeling Service: Managed human labeling workforce that produces training labels needed for AutoML or custom-trained models.

Data Processing and Ingestion

Dataflow: Serverless, autoscaling batch/stream processing built on Apache Beam; default choice for new ETL and feature engineering pipelines.
Dataproc: Managed Spark/Hadoop clusters; choose only when reusing existing Spark ML code or the broader Hadoop ecosystem.
Pub/Sub: Serverless messaging that decouples event producers from consumers in streaming ML feature pipelines.
Cloud Storage storage classes: Standard (frequent access), Nearline (30-day), Coldline (90-day), Archive (365-day); use Standard for active training data and model artifacts.
TFRecord format: Binary, protobuf-based storage format optimized for high-throughput TensorFlow input pipelines.
tf.Transform: Applies the exact same preprocessing graph at both training and serving time to eliminate training-serving skew in TFX pipelines.

Distributed Training and Hyperparameter Tuning

Data parallelism: Replicates the full model across workers, each processing a data shard, with gradients synchronized each step; the most common strategy.
Model parallelism: Splits model layers across multiple devices; use only when a single model does not fit in one device's memory.
Reduction Server: Vertex AI feature that accelerates all-reduce gradient synchronization for large-scale distributed GPU training.
GPU vs TPU selection: GPUs broadly support PyTorch and custom CUDA ops; TPUs are optimized for large-scale TensorFlow/JAX training with maximum throughput.
Vizier search space definition: Define each hyperparameter's type (DOUBLE, INTEGER, CATEGORICAL, DISCRETE) and scale (linear or log) in the study configuration.
Automated early stopping: Vizier can terminate underperforming hyperparameter trials early to save compute without exhausting the full trial budget.

Vertex AI Pipelines and MLOps

from kfp import dsl @dsl.pipeline(name="training-pipeline", pipeline_root="gs://bucket/root") def my_pipeline(project: str): ...: Defines a Kubeflow Pipelines DAG using the KFP SDK, the standard way to author Vertex AI Pipelines.
from google.cloud import aiplatform job = aiplatform.PipelineJob( display_name="my-run", template_path="pipeline.json", parameter_values={"project": "my-proj"}, ) job.run(): Compiles a pipeline template and submits it as a Vertex AI Pipeline run from Python.
@dsl.component decorator: Packages a Python function as a standalone, containerized, reusable pipeline component with typed inputs and outputs.
Cloud Build: CI/CD engine that builds and tests pipeline component containers and pushes them to Artifact Registry before deployment.
Artifact Registry: Stores versioned container images and pipeline artifacts referenced by Vertex AI Pipelines components.
Pipeline trigger types: Scheduled (Cloud Scheduler cron), event-driven (Pub/Sub), or data-driven (Cloud Storage/BigQuery change) automation for retraining pipelines.
TFX standard component order: ExampleGen to StatisticsGen to SchemaGen to Transform to Trainer to Evaluator to Pusher, chained in sequence for production ML pipelines.

Serving and Deployment Patterns

Online prediction (Vertex AI Endpoints): Real-time, low-latency inference with autoscaling; scaling is reactive and cannot instantly absorb sudden traffic spikes.
Batch prediction: Spins up temporary compute, scores a full dataset, then shuts down with no persistent endpoint cost; ideal for periodic scoring jobs.
gcloud ai endpoints deploy-model ENDPOINT_ID --region=REGION --model=MODEL_ID --traffic-split=0=80,1=20: Routes a percentage of production traffic to each deployed model version for canary or blue-green rollout.
Model optimization for serving: Quantization (lower numeric precision), pruning (remove low-impact weights), and distillation (train a smaller student model) trade some accuracy for lower latency and cost.
Private Endpoints (VPC Peering): Serve predictions over a private network connection instead of the public internet for lower latency and improved security.
Pre-built vs custom serving containers: Pre-built containers cover common frameworks (TensorFlow, scikit-learn, XGBoost, PyTorch); custom containers are needed for unsupported runtimes or dependencies.
NVIDIA Triton on Vertex AI: Supports multi-framework model serving with dynamic batching for higher throughput on a single endpoint.

Monitoring, Explainability, and Responsible AI

Vertex AI Model Monitoring: Detects training-serving skew and prediction drift by comparing live traffic feature statistics against a stored baseline.
Data drift vs concept drift vs prediction drift: Data drift is a shift in input feature distribution, concept drift is a shift in the input-output relationship, prediction drift is a shift in model output distribution.
Feature attribution methods: Shapley values (game-theoretic, tabular/AutoML), Integrated Gradients (differentiable models), and XRAI (region-based saliency for images).
Continuous evaluation: Compares live predictions against ground-truth labels as they become available to track real-world accuracy over time.
Fairness metrics: Demographic parity requires equal positive prediction rates across groups; equalized odds requires equal true/false positive rates across groups.
Vertex Explainable AI: Returns per-prediction feature attributions via the same API call as standard online or batch prediction requests.
What-If Tool and Model Cards: The What-If Tool interactively probes model behavior across data slices and thresholds; Model Cards document intended use, limitations, and evaluation results.

IAM Roles for ML

roles/aiplatform.user: Grants permission to create and run jobs, deploy models, and manage most Vertex AI resources without full administrative control.
roles/aiplatform.admin: Full control over all Vertex AI resources, including managing IAM policy on the AI Platform project.
roles/aiplatform.viewer: Read-only access to view Vertex AI resources, jobs, endpoints, and models.
roles/bigquery.dataEditor + roles/bigquery.jobUser: Minimum role combination needed to create datasets, train BigQuery ML models, and run queries.
roles/storage.objectAdmin: Grants read, write, and delete access on Cloud Storage buckets used for training data and model artifacts.
Service accounts for training and pipeline jobs: Vertex AI jobs execute as a service account; grant it only the minimum roles required, following least-privilege principles.

Architecture Decision Patterns — Quick Rules

Choose BigQuery ML when...: Data is already in BigQuery, the team is SQL-skilled, and the model type is supported (regression, classification, clustering, forecasting).
Choose AutoML when...: The data type is standard (tabular, image, text, video), you have your own labeled data, and time-to-production outweighs full customization.
Choose Pre-built APIs when...: The task is general-purpose (vision, text, translation, speech) and no labeled training data is available at all.
Choose custom training when...: The scenario requires a novel architecture, specialized preprocessing, or maximum achievable model performance.
Choose Dataflow over Dataproc when...: Building a new pipeline with no existing Spark/Hadoop dependency; reserve Dataproc for migrating existing Spark ML workloads.
Choose online over batch prediction when...: The scenario mentions real-time or low-latency requirements; choose batch when scoring large datasets on a schedule with no latency constraint.
Choose Vertex AI Pipelines over Cloud Composer when...: The workflow is ML-specific; Composer only wins the exam when the scenario mentions existing Airflow DAGs or non-ML orchestration needs.

Ready to test yourself?

Start a timed GCP-PMLE mock exam or review practice questions by domain.