DatabricksGenAI Engineer Associate58 concepts

GenAI Engineer Associate Cheat Sheet

Quick reference for the Databricks Certified Generative AI Engineer Associate exam.

Quick Navigation

Prompt Engineering Fundamentals RAG (Retrieval-Augmented Generation)Data Preparation for GenAI Application Development with LangChain and MLflow Assembling and Deploying Applications Agentic Systems and Multi-Agent Patterns Governance and Guardrails Evaluation and Monitoring

Prompt Engineering Fundamentals

System Prompt: Instruction block passed to the LLM that defines behavior, role, and constraints. Processed before user input. Controls tone, format, and safety boundaries.
Few-Shot Prompting: Include example input-output pairs in the prompt to guide the LLM toward the desired response format without fine-tuning the model weights.
Chain-of-Thought Prompting: Instruct the model to reason step-by-step before giving the final answer. Improves accuracy on multi-step reasoning tasks.
Output Formatting Instructions: Direct the LLM to respond in JSON, Markdown, bullet lists, or other structured formats by explicitly specifying the schema in the prompt.
Temperature Parameter: Controls randomness. Low temperature (0.0-0.3): deterministic, factual. High temperature (0.7-1.0): creative, varied. Use low for RAG, higher for generation.
Prompt Injection Risk: Malicious user input that overrides system instructions. Mitigate with input validation, guardrails, and separating trusted instructions from user content.

RAG (Retrieval-Augmented Generation)

RAG Pipeline Components: Source documents → chunking → embedding → vector store → retrieval → prompt augmentation → LLM → response. Each stage can be optimized independently.
Chunking Strategies: Fixed-size: simple, predictable. Sentence/paragraph: preserves semantic units. Recursive: splits by hierarchy (paragraph → sentence → word). Choose based on document structure.
Chunk Overlap: Include overlapping tokens between adjacent chunks to prevent context loss at chunk boundaries. Typical overlap: 10-20% of chunk size.
Embedding Models: Transform text chunks into dense vector representations. Choose context length based on average chunk size. Longer context models handle larger chunks.
Mosaic AI Vector Search: Databricks-managed vector database integrated with Unity Catalog. Supports Delta Sync (auto-sync from Delta table) and Direct Vector Access modes.
Vector Search Index Types: Delta Sync Index: auto-updated from a Delta table source, fully managed. Direct Vector Access Index: you manage upserts/deletes directly. Choose based on update frequency.
Similarity Search: Query the vector index with an embedded query vector. Returns top-k most similar chunks by cosine similarity or dot product. Tune k based on retrieval quality.
Re-ranking: Post-retrieval step that re-scores retrieved chunks using a cross-encoder model. Improves relevance precision after initial vector search recall. Adds latency but improves quality.

Data Preparation for GenAI

Document Extraction Libraries: PyPDF2/pdfplumber for PDFs, python-docx for Word files, BeautifulSoup for HTML, unstructured for mixed document types. Choose based on source format.
Extraneous Content Removal: Strip headers, footers, page numbers, navigation menus, boilerplate disclaimers, and formatting artifacts before chunking. They degrade retrieval relevance.
Writing Chunks to Delta Lake: Store chunked text in a Delta Lake table in Unity Catalog. Include columns: chunk_id, source_doc, chunk_text, metadata. Used as source for Vector Search sync.
Unity Catalog for RAG Data: Govern embedding tables, vector search indexes, and source document tables under Unity Catalog. Enables lineage tracking and access control for GenAI data assets.
Retrieval Evaluation Metrics: Precision@k: fraction of retrieved chunks that are relevant. Recall@k: fraction of relevant chunks retrieved. MRR: mean reciprocal rank of first relevant result.
Chunk Size Trade-offs: Small chunks: higher retrieval precision, may lack context. Large chunks: more context, lower precision, higher embedding cost. Balance based on eval metrics.

Application Development with LangChain and MLflow

LangChain LCEL (LangChain Expression Language): Pipe-based syntax to compose chains: retriever | prompt | llm | parser. Each component is a Runnable. Enables easy composition and streaming.
ChatPromptTemplate: LangChain template combining system message, optional examples, and human message with {variable} placeholders filled at runtime from user input.
LLM Guardrails: Input/output validation layers preventing harmful content, PII exposure, topic drift, or prompt injection. Implement with Llama Guard, custom classifiers, or moderation APIs.
Foundation Model APIs: Databricks-hosted LLM endpoints (DBRX, Llama, Mixtral, etc.) accessible via standard OpenAI-compatible API. No infrastructure management required.
MLflow AI Gateway: Unified proxy for LLM calls supporting both Databricks-hosted and external (OpenAI, Anthropic) models. Provides rate limiting, Inference Tables logging, and Usage Tables for cost tracking.
MLflow Tracing: Automatic instrumentation capturing the full execution trace of an LLM chain: inputs, outputs, latencies, and intermediate steps. Essential for debugging multi-step agents.
pyfunc Model Flavor: Generic MLflow model type using Python function interface. Use for RAG chains with pre/post-processing logic that does not fit a specific framework flavor.
mlflow.langchain.log_model(): Log a LangChain chain or agent to MLflow as a langchain flavor model. Captures the entire runnable including prompts, retriever config, and LLM endpoint.

Assembling and Deploying Applications

Model Registration to Unity Catalog: mlflow.register_model(model_uri, 'catalog.schema.model_name') — registers a logged model to Unity Catalog. Enables governance, versioning, and lineage.
Model Serving Endpoints: Deploy registered MLflow models as REST API endpoints via Databricks Model Serving. Auto-scaling, serverless option available. Accessed via standard REST or Databricks SDK.
Model Serving — Resource Access: Grant endpoints access to external resources (Vector Search, Delta tables, secrets) using Databricks service principals. Endpoints run with the service principal's permissions.
Serving Endpoint Environment Variables: Pass API keys, tokens, and config values as environment variables or Databricks Secrets to model serving endpoints. Never hardcode credentials in model artifacts.
ai_query() Function: Databricks SQL function that calls a model serving endpoint directly from a SQL query. Enables batch inference on Delta tables without writing Python pipeline code.
Batch Inference Pattern: Read source Delta table → apply ai_query() or spark.udf with LLM call → write results to output Delta table. Suitable for offline enrichment at scale.
Prompt Version Control: Track prompt templates as versioned MLflow artifacts or Unity Catalog assets. Enables rollback, A/B comparison, and promotion between environments (dev → staging → prod).
CI/CD for GenAI Apps: Automate: Vector Search index updates, prompt version promotion, model registration, endpoint deployment. Use Databricks Asset Bundles or GitHub Actions.

Agentic Systems and Multi-Agent Patterns

MLflow Agent Framework: Databricks framework for building, evaluating, and deploying agentic systems. Provides tool calling, state management, and MLflow tracing integration out of the box.
Tool Definition: Agents use tools (Python functions, SQL queries, API calls, Vector Search) to take actions. Each tool has a name, description, and typed input/output schema for LLM selection.
ReAct Pattern (Reason + Act): Agent loop: Think (LLM decides which tool to use) → Act (execute tool) → Observe (process result) → Repeat until goal is achieved or max iterations reached.
Agent Bricks — Knowledge Assistant: Pre-built agent type for Q&A over documents using RAG. Configurable with a Vector Search index and LLM endpoint. Minimal custom code required.
Agent Bricks — Multiagent Supervisor: Orchestrator agent that routes subtasks to specialized sub-agents. Use when tasks require different expertise domains (e.g., SQL agent + document agent).
Agent Bricks — Information Extraction: Pre-built agent for extracting structured data from unstructured text into a defined schema. Outputs JSON conforming to a user-specified Pydantic model.
Genie Spaces: Databricks feature enabling natural language querying of structured data (Delta tables, SQL warehouses) via a conversational interface. Enables multi-agent data access.
Multi-Agent Communication: Agents communicate via function calls or conversational APIs. The supervisor agent passes context and receives results from sub-agents to compose a final response.

Governance and Guardrails

Input Guardrails: Validate and filter user inputs before sending to the LLM. Detect prompt injection, harmful content, PII, and off-topic queries. Block or sanitize before processing.
Output Guardrails: Validate LLM outputs before returning to users. Check for hallucinations, PII leakage, harmful content, or policy violations. Can invoke a secondary LLM as judge.
PII Masking: Detect and replace PII (names, emails, SSNs, phone numbers) in inputs and outputs using NER models or regex patterns. Prevents accidental PII exposure via LLM.
Data Source Licensing: Verify licenses of training/RAG documents (CC-BY, CC-BY-SA, commercial restrictions). Some licenses prohibit use in commercial GenAI applications.
Unity Catalog Permissions for GenAI: Grant EXECUTE on functions, SELECT on tables, and USE on schemas to model serving service principals. Follows standard Unity Catalog privilege model.
Problematic Text Mitigation: Replace harmful or biased text in RAG sources with: filtered datasets, curated alternatives, or content policy flagging rather than using raw data.

Evaluation and Monitoring

MLflow evaluate(): mlflow.evaluate(model, data, targets, evaluators=[...]) — runs automated evaluation of LLM/agent outputs against metrics. Logs results as MLflow run artifacts.
LLM-as-Judge Metrics: Score responses with a powerful LLM (no ground truth needed): faithfulness (response supported by context?), answer_relevance (addresses the question?), harmfulness.
Ground Truth Metrics: Metrics requiring labeled reference answers: exact_match, ROUGE, BLEU, answer_correctness. Use when a curated QA dataset with known correct answers is available.
Faithfulness (Groundedness): Measures whether the LLM response is supported by the retrieved context. High faithfulness = no hallucination. Scored by LLM judge — no ground truth required.
Inference Tables: Auto-logging of every request and response to a Delta table for a model serving endpoint. Enables offline analysis, drift detection, and quality monitoring over time.
Agent Monitoring (Lakehouse Monitoring): Databricks feature that monitors deployed agent endpoints using inference table data. Tracks latency, token usage, error rates, and LLM-scored quality metrics.
Usage Tables (AI Gateway): Log token consumption per LLM request routed through AI Gateway. Use for cost attribution, budget enforcement, and identifying expensive query patterns.
Databricks Scorers: Custom evaluation functions registered in MLflow that score model outputs on domain-specific criteria. Extend built-in metrics with business-logic quality checks.

Ready to test yourself?

Start a timed GenAI Engineer Associate mock exam or review practice questions by domain.