CertPrepNow
DatabricksGenAI Engineer Associate58 concepts

GenAI Engineer Associate Cheat Sheet

Quick reference for the Databricks Certified Generative AI Engineer Associate exam.

Prompt Engineering Fundamentals

System Prompt
Instruction block passed to the LLM that defines behavior, role, and constraints. Processed before user input. Controls tone, format, and safety boundaries.
Few-Shot Prompting
Include example input-output pairs in the prompt to guide the LLM toward the desired response format without fine-tuning the model weights.
Chain-of-Thought Prompting
Instruct the model to reason step-by-step before giving the final answer. Improves accuracy on multi-step reasoning tasks.
Output Formatting Instructions
Direct the LLM to respond in JSON, Markdown, bullet lists, or other structured formats by explicitly specifying the schema in the prompt.
Temperature Parameter
Controls randomness. Low temperature (0.0-0.3): deterministic, factual. High temperature (0.7-1.0): creative, varied. Use low for RAG, higher for generation.
Prompt Injection Risk
Malicious user input that overrides system instructions. Mitigate with input validation, guardrails, and separating trusted instructions from user content.

RAG (Retrieval-Augmented Generation)

RAG Pipeline Components
Source documents → chunking → embedding → vector store → retrieval → prompt augmentation → LLM → response. Each stage can be optimized independently.
Chunking Strategies
Fixed-size: simple, predictable. Sentence/paragraph: preserves semantic units. Recursive: splits by hierarchy (paragraph → sentence → word). Choose based on document structure.
Chunk Overlap
Include overlapping tokens between adjacent chunks to prevent context loss at chunk boundaries. Typical overlap: 10-20% of chunk size.
Embedding Models
Transform text chunks into dense vector representations. Choose context length based on average chunk size. Longer context models handle larger chunks.
Mosaic AI Vector Search
Databricks-managed vector database integrated with Unity Catalog. Supports Delta Sync (auto-sync from Delta table) and Direct Vector Access modes.
Vector Search Index Types
Delta Sync Index: auto-updated from a Delta table source, fully managed. Direct Vector Access Index: you manage upserts/deletes directly. Choose based on update frequency.
Similarity Search
Query the vector index with an embedded query vector. Returns top-k most similar chunks by cosine similarity or dot product. Tune k based on retrieval quality.
Re-ranking
Post-retrieval step that re-scores retrieved chunks using a cross-encoder model. Improves relevance precision after initial vector search recall. Adds latency but improves quality.

Data Preparation for GenAI

Document Extraction Libraries
PyPDF2/pdfplumber for PDFs, python-docx for Word files, BeautifulSoup for HTML, unstructured for mixed document types. Choose based on source format.
Extraneous Content Removal
Strip headers, footers, page numbers, navigation menus, boilerplate disclaimers, and formatting artifacts before chunking. They degrade retrieval relevance.
Writing Chunks to Delta Lake
Store chunked text in a Delta Lake table in Unity Catalog. Include columns: chunk_id, source_doc, chunk_text, metadata. Used as source for Vector Search sync.
Unity Catalog for RAG Data
Govern embedding tables, vector search indexes, and source document tables under Unity Catalog. Enables lineage tracking and access control for GenAI data assets.
Retrieval Evaluation Metrics
Precision@k: fraction of retrieved chunks that are relevant. Recall@k: fraction of relevant chunks retrieved. MRR: mean reciprocal rank of first relevant result.
Chunk Size Trade-offs
Small chunks: higher retrieval precision, may lack context. Large chunks: more context, lower precision, higher embedding cost. Balance based on eval metrics.

Application Development with LangChain and MLflow

LangChain LCEL (LangChain Expression Language)
Pipe-based syntax to compose chains: retriever | prompt | llm | parser. Each component is a Runnable. Enables easy composition and streaming.
ChatPromptTemplate
LangChain template combining system message, optional examples, and human message with {variable} placeholders filled at runtime from user input.
LLM Guardrails
Input/output validation layers preventing harmful content, PII exposure, topic drift, or prompt injection. Implement with Llama Guard, custom classifiers, or moderation APIs.
Foundation Model APIs
Databricks-hosted LLM endpoints (DBRX, Llama, Mixtral, etc.) accessible via standard OpenAI-compatible API. No infrastructure management required.
MLflow AI Gateway
Unified proxy for LLM calls supporting both Databricks-hosted and external (OpenAI, Anthropic) models. Provides rate limiting, Inference Tables logging, and Usage Tables for cost tracking.
MLflow Tracing
Automatic instrumentation capturing the full execution trace of an LLM chain: inputs, outputs, latencies, and intermediate steps. Essential for debugging multi-step agents.
pyfunc Model Flavor
Generic MLflow model type using Python function interface. Use for RAG chains with pre/post-processing logic that does not fit a specific framework flavor.
mlflow.langchain.log_model()
Log a LangChain chain or agent to MLflow as a langchain flavor model. Captures the entire runnable including prompts, retriever config, and LLM endpoint.

Assembling and Deploying Applications

Model Registration to Unity Catalog
mlflow.register_model(model_uri, 'catalog.schema.model_name') — registers a logged model to Unity Catalog. Enables governance, versioning, and lineage.
Model Serving Endpoints
Deploy registered MLflow models as REST API endpoints via Databricks Model Serving. Auto-scaling, serverless option available. Accessed via standard REST or Databricks SDK.
Model Serving — Resource Access
Grant endpoints access to external resources (Vector Search, Delta tables, secrets) using Databricks service principals. Endpoints run with the service principal's permissions.
Serving Endpoint Environment Variables
Pass API keys, tokens, and config values as environment variables or Databricks Secrets to model serving endpoints. Never hardcode credentials in model artifacts.
ai_query() Function
Databricks SQL function that calls a model serving endpoint directly from a SQL query. Enables batch inference on Delta tables without writing Python pipeline code.
Batch Inference Pattern
Read source Delta table → apply ai_query() or spark.udf with LLM call → write results to output Delta table. Suitable for offline enrichment at scale.
Prompt Version Control
Track prompt templates as versioned MLflow artifacts or Unity Catalog assets. Enables rollback, A/B comparison, and promotion between environments (dev → staging → prod).
CI/CD for GenAI Apps
Automate: Vector Search index updates, prompt version promotion, model registration, endpoint deployment. Use Databricks Asset Bundles or GitHub Actions.

Agentic Systems and Multi-Agent Patterns

MLflow Agent Framework
Databricks framework for building, evaluating, and deploying agentic systems. Provides tool calling, state management, and MLflow tracing integration out of the box.
Tool Definition
Agents use tools (Python functions, SQL queries, API calls, Vector Search) to take actions. Each tool has a name, description, and typed input/output schema for LLM selection.
ReAct Pattern (Reason + Act)
Agent loop: Think (LLM decides which tool to use) → Act (execute tool) → Observe (process result) → Repeat until goal is achieved or max iterations reached.
Agent Bricks — Knowledge Assistant
Pre-built agent type for Q&A over documents using RAG. Configurable with a Vector Search index and LLM endpoint. Minimal custom code required.
Agent Bricks — Multiagent Supervisor
Orchestrator agent that routes subtasks to specialized sub-agents. Use when tasks require different expertise domains (e.g., SQL agent + document agent).
Agent Bricks — Information Extraction
Pre-built agent for extracting structured data from unstructured text into a defined schema. Outputs JSON conforming to a user-specified Pydantic model.
Genie Spaces
Databricks feature enabling natural language querying of structured data (Delta tables, SQL warehouses) via a conversational interface. Enables multi-agent data access.
Multi-Agent Communication
Agents communicate via function calls or conversational APIs. The supervisor agent passes context and receives results from sub-agents to compose a final response.

Governance and Guardrails

Input Guardrails
Validate and filter user inputs before sending to the LLM. Detect prompt injection, harmful content, PII, and off-topic queries. Block or sanitize before processing.
Output Guardrails
Validate LLM outputs before returning to users. Check for hallucinations, PII leakage, harmful content, or policy violations. Can invoke a secondary LLM as judge.
PII Masking
Detect and replace PII (names, emails, SSNs, phone numbers) in inputs and outputs using NER models or regex patterns. Prevents accidental PII exposure via LLM.
Data Source Licensing
Verify licenses of training/RAG documents (CC-BY, CC-BY-SA, commercial restrictions). Some licenses prohibit use in commercial GenAI applications.
Unity Catalog Permissions for GenAI
Grant EXECUTE on functions, SELECT on tables, and USE on schemas to model serving service principals. Follows standard Unity Catalog privilege model.
Problematic Text Mitigation
Replace harmful or biased text in RAG sources with: filtered datasets, curated alternatives, or content policy flagging rather than using raw data.

Evaluation and Monitoring

MLflow evaluate()
mlflow.evaluate(model, data, targets, evaluators=[...]) — runs automated evaluation of LLM/agent outputs against metrics. Logs results as MLflow run artifacts.
LLM-as-Judge Metrics
Score responses with a powerful LLM (no ground truth needed): faithfulness (response supported by context?), answer_relevance (addresses the question?), harmfulness.
Ground Truth Metrics
Metrics requiring labeled reference answers: exact_match, ROUGE, BLEU, answer_correctness. Use when a curated QA dataset with known correct answers is available.
Faithfulness (Groundedness)
Measures whether the LLM response is supported by the retrieved context. High faithfulness = no hallucination. Scored by LLM judge — no ground truth required.
Inference Tables
Auto-logging of every request and response to a Delta table for a model serving endpoint. Enables offline analysis, drift detection, and quality monitoring over time.
Agent Monitoring (Lakehouse Monitoring)
Databricks feature that monitors deployed agent endpoints using inference table data. Tracks latency, token usage, error rates, and LLM-scored quality metrics.
Usage Tables (AI Gateway)
Log token consumption per LLM request routed through AI Gateway. Use for cost attribution, budget enforcement, and identifying expensive query patterns.
Databricks Scorers
Custom evaluation functions registered in MLflow that score model outputs on domain-specific criteria. Extend built-in metrics with business-logic quality checks.

Ready to test yourself?

Start a timed GenAI Engineer Associate mock exam or review practice questions by domain.