Quick Navigation
Vertex AI Core PlatformVertex AI Training — gcloud CLIBigQuery MLAutoML and Pre-built APIsData Processing and IngestionDistributed Training and Hyperparameter TuningVertex AI Pipelines and MLOpsServing and Deployment PatternsMonitoring, Explainability, and Responsible AIIAM Roles for MLArchitecture Decision Patterns — Quick Rules
Vertex AI Core Platform
- Vertex AI
- Unified ML platform combining AutoML, custom training, pipelines, feature store, and endpoints under one API and SDK.
- Vertex AI Workbench
- Managed JupyterLab environment; managed notebooks auto-idle for cost savings while user-managed notebooks give full VM control.
- Vertex AI Model Registry
- Central repository for versioning model artifacts, metadata, and evaluation metrics before deployment to an endpoint.
- Vertex AI Experiments
- Tracks and compares metrics, parameters, and artifacts across training runs; not used for production A/B traffic testing.
- Vertex AI Feature Store
- Centralized feature repository with low-latency online serving and point-in-time offline serving to prevent training-serving skew.
- Vertex AI Vizier
- Black-box hyperparameter optimization service using Bayesian optimization; can tune any system, not only Vertex AI training jobs.
- Vertex AI Metadata / ML Metadata (MLMD)
- Automatically records lineage of datasets, models, and pipeline executions for artifact provenance and reproducibility.
Vertex AI Training — gcloud CLI
- gcloud ai custom-jobs create --region=REGION --display-name=NAME --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=IMAGE_URI
- Launches a custom training job with a specified container image, machine type, and worker pool configuration.
- gcloud ai models upload --region=REGION --display-name=NAME --container-image-uri=URI --artifact-uri=gs://BUCKET/model
- Registers a trained model artifact from Cloud Storage into the Model Registry for later deployment.
- gcloud ai hp-tuning-jobs create --region=REGION --config=study_config.yaml --display-name=NAME
- Submits a Vertex AI Vizier hyperparameter tuning job defined by a YAML study configuration.
- gcloud ai endpoints create --region=REGION --display-name=NAME
- Creates an empty prediction endpoint that one or more models can later be deployed to.
- gcloud ai endpoints deploy-model ENDPOINT_ID --region=REGION --model=MODEL_ID --machine-type=n1-standard-4 --min-replica-count=1 --max-replica-count=5
- Deploys a registered model to an endpoint with defined autoscaling replica bounds.
- gcloud ai custom-jobs stream-logs JOB_ID --region=REGION
- Streams live training logs from a running custom job for real-time debugging.
BigQuery ML
- CREATE OR REPLACE MODEL `dataset.model_name` OPTIONS(model_type='logistic_reg', input_label_cols=['label']) AS SELECT * FROM `dataset.training_table`;
- Trains a model directly on BigQuery data using standard SQL with no data export required.
- SELECT * FROM ML.PREDICT(MODEL `dataset.model_name`, TABLE `dataset.new_data`)
- Generates predictions on new rows using a trained BigQuery ML model.
- SELECT * FROM ML.EVALUATE(MODEL `dataset.model_name`)
- Returns evaluation metrics such as precision, recall, and ROC AUC for a trained BigQuery ML model.
- SELECT * FROM ML.EXPLAIN_PREDICT(MODEL `dataset.model_name`, TABLE `dataset.new_data`, STRUCT(3 AS top_k_features))
- Returns Shapley-value feature attributions explaining each individual prediction.
- BQML model_type options
- linear_reg, logistic_reg, kmeans, matrix_factorization, boosted_tree_classifier/regressor, dnn_classifier/regressor, arima_plus, autoencoder, pca — pick per business problem.
- bq query --use_legacy_sql=false "SELECT * FROM ML.PREDICT(MODEL \`ds.m\`, TABLE \`ds.t\`)"
- Runs a BigQuery ML SQL statement from the command line for scripting and automation.
- TRANSFORM() clause in CREATE MODEL
- Bakes feature preprocessing into the model definition so the same transforms apply automatically at both training and prediction time.
AutoML and Pre-built APIs
- AutoML Tabular / Image / Text / Video
- No-code training with automated feature engineering, architecture search, and hyperparameter tuning; requires your own labeled data.
- Vision AI
- Pre-trained API for label detection, OCR, face detection, and explicit content moderation with zero training data required.
- Natural Language AI
- Pre-trained API for entity extraction, sentiment analysis, and syntax parsing on unstructured text.
- Translation AI
- Pre-trained and custom-glossary API for real-time and batch text translation across 100+ languages.
- Speech-to-Text / Text-to-Speech
- Pre-trained APIs for streaming and batch audio transcription and natural-sounding speech synthesis.
- Data Labeling Service
- Managed human labeling workforce that produces training labels needed for AutoML or custom-trained models.
Data Processing and Ingestion
- Dataflow
- Serverless, autoscaling batch/stream processing built on Apache Beam; default choice for new ETL and feature engineering pipelines.
- Dataproc
- Managed Spark/Hadoop clusters; choose only when reusing existing Spark ML code or the broader Hadoop ecosystem.
- Pub/Sub
- Serverless messaging that decouples event producers from consumers in streaming ML feature pipelines.
- Cloud Storage storage classes
- Standard (frequent access), Nearline (30-day), Coldline (90-day), Archive (365-day); use Standard for active training data and model artifacts.
- TFRecord format
- Binary, protobuf-based storage format optimized for high-throughput TensorFlow input pipelines.
- tf.Transform
- Applies the exact same preprocessing graph at both training and serving time to eliminate training-serving skew in TFX pipelines.
Distributed Training and Hyperparameter Tuning
- Data parallelism
- Replicates the full model across workers, each processing a data shard, with gradients synchronized each step; the most common strategy.
- Model parallelism
- Splits model layers across multiple devices; use only when a single model does not fit in one device's memory.
- Reduction Server
- Vertex AI feature that accelerates all-reduce gradient synchronization for large-scale distributed GPU training.
- GPU vs TPU selection
- GPUs broadly support PyTorch and custom CUDA ops; TPUs are optimized for large-scale TensorFlow/JAX training with maximum throughput.
- Vizier search space definition
- Define each hyperparameter's type (DOUBLE, INTEGER, CATEGORICAL, DISCRETE) and scale (linear or log) in the study configuration.
- Automated early stopping
- Vizier can terminate underperforming hyperparameter trials early to save compute without exhausting the full trial budget.
Vertex AI Pipelines and MLOps
- from kfp import dsl @dsl.pipeline(name="training-pipeline", pipeline_root="gs://bucket/root") def my_pipeline(project: str): ...
- Defines a Kubeflow Pipelines DAG using the KFP SDK, the standard way to author Vertex AI Pipelines.
- from google.cloud import aiplatform job = aiplatform.PipelineJob( display_name="my-run", template_path="pipeline.json", parameter_values={"project": "my-proj"}, ) job.run()
- Compiles a pipeline template and submits it as a Vertex AI Pipeline run from Python.
- @dsl.component decorator
- Packages a Python function as a standalone, containerized, reusable pipeline component with typed inputs and outputs.
- Cloud Build
- CI/CD engine that builds and tests pipeline component containers and pushes them to Artifact Registry before deployment.
- Artifact Registry
- Stores versioned container images and pipeline artifacts referenced by Vertex AI Pipelines components.
- Pipeline trigger types
- Scheduled (Cloud Scheduler cron), event-driven (Pub/Sub), or data-driven (Cloud Storage/BigQuery change) automation for retraining pipelines.
- TFX standard component order
- ExampleGen to StatisticsGen to SchemaGen to Transform to Trainer to Evaluator to Pusher, chained in sequence for production ML pipelines.
Serving and Deployment Patterns
- Online prediction (Vertex AI Endpoints)
- Real-time, low-latency inference with autoscaling; scaling is reactive and cannot instantly absorb sudden traffic spikes.
- Batch prediction
- Spins up temporary compute, scores a full dataset, then shuts down with no persistent endpoint cost; ideal for periodic scoring jobs.
- gcloud ai endpoints deploy-model ENDPOINT_ID --region=REGION --model=MODEL_ID --traffic-split=0=80,1=20
- Routes a percentage of production traffic to each deployed model version for canary or blue-green rollout.
- Model optimization for serving
- Quantization (lower numeric precision), pruning (remove low-impact weights), and distillation (train a smaller student model) trade some accuracy for lower latency and cost.
- Private Endpoints (VPC Peering)
- Serve predictions over a private network connection instead of the public internet for lower latency and improved security.
- Pre-built vs custom serving containers
- Pre-built containers cover common frameworks (TensorFlow, scikit-learn, XGBoost, PyTorch); custom containers are needed for unsupported runtimes or dependencies.
- NVIDIA Triton on Vertex AI
- Supports multi-framework model serving with dynamic batching for higher throughput on a single endpoint.
Monitoring, Explainability, and Responsible AI
- Vertex AI Model Monitoring
- Detects training-serving skew and prediction drift by comparing live traffic feature statistics against a stored baseline.
- Data drift vs concept drift vs prediction drift
- Data drift is a shift in input feature distribution, concept drift is a shift in the input-output relationship, prediction drift is a shift in model output distribution.
- Feature attribution methods
- Shapley values (game-theoretic, tabular/AutoML), Integrated Gradients (differentiable models), and XRAI (region-based saliency for images).
- Continuous evaluation
- Compares live predictions against ground-truth labels as they become available to track real-world accuracy over time.
- Fairness metrics
- Demographic parity requires equal positive prediction rates across groups; equalized odds requires equal true/false positive rates across groups.
- Vertex Explainable AI
- Returns per-prediction feature attributions via the same API call as standard online or batch prediction requests.
- What-If Tool and Model Cards
- The What-If Tool interactively probes model behavior across data slices and thresholds; Model Cards document intended use, limitations, and evaluation results.
IAM Roles for ML
- roles/aiplatform.user
- Grants permission to create and run jobs, deploy models, and manage most Vertex AI resources without full administrative control.
- roles/aiplatform.admin
- Full control over all Vertex AI resources, including managing IAM policy on the AI Platform project.
- roles/aiplatform.viewer
- Read-only access to view Vertex AI resources, jobs, endpoints, and models.
- roles/bigquery.dataEditor + roles/bigquery.jobUser
- Minimum role combination needed to create datasets, train BigQuery ML models, and run queries.
- roles/storage.objectAdmin
- Grants read, write, and delete access on Cloud Storage buckets used for training data and model artifacts.
- Service accounts for training and pipeline jobs
- Vertex AI jobs execute as a service account; grant it only the minimum roles required, following least-privilege principles.
Architecture Decision Patterns — Quick Rules
- Choose BigQuery ML when...
- Data is already in BigQuery, the team is SQL-skilled, and the model type is supported (regression, classification, clustering, forecasting).
- Choose AutoML when...
- The data type is standard (tabular, image, text, video), you have your own labeled data, and time-to-production outweighs full customization.
- Choose Pre-built APIs when...
- The task is general-purpose (vision, text, translation, speech) and no labeled training data is available at all.
- Choose custom training when...
- The scenario requires a novel architecture, specialized preprocessing, or maximum achievable model performance.
- Choose Dataflow over Dataproc when...
- Building a new pipeline with no existing Spark/Hadoop dependency; reserve Dataproc for migrating existing Spark ML workloads.
- Choose online over batch prediction when...
- The scenario mentions real-time or low-latency requirements; choose batch when scoring large datasets on a schedule with no latency constraint.
- Choose Vertex AI Pipelines over Cloud Composer when...
- The workflow is ML-specific; Composer only wins the exam when the scenario mentions existing Airflow DAGs or non-ML orchestration needs.