CertPrepNow
DatabricksGenAI Engineer AssociateUpdated 2026-06-08

GenAI Engineer Associate Study Guide

Everything you need to pass the Databricks Certified Generative AI Engineer Associate exam. Structured study plans, key services, common traps, and practice questions.

You Can Pass This Exam For Free

The GenAI Engineer Associate exam is passable with free resources if you have hands-on experience with Databricks and study consistently for 4-6 weeks:

  • Databricks official exam study guide (free download from Databricks Academy)
  • Databricks documentation: Mosaic AI, Vector Search, MLflow, Agent Framework, Foundation Model APIs (free)
  • Databricks Academy free courses: Generative AI Fundamentals, Large Language Models and RAG (free tier available)
  • MLflow documentation covering LLM evaluation, tracing, and pyfunc models (free at mlflow.org)
  • Databricks YouTube channel: Mosaic AI tutorials, MLflow tracing demos, RAG walkthroughs (free)
  • Databricks Community Edition for hands-on notebook practice (free)
  • Free practice questions on this site

The GenAI Engineer exam tests practical implementation knowledge. You need hands-on experience building RAG pipelines, configuring Vector Search, and deploying LLM endpoints on Databricks. Reading documentation alone is insufficient — work through the notebooks and build real chains.

Choose Your Study Path

You have general Python and data engineering skills but limited experience with LLMs, RAG, or Databricks Mosaic AI. You need to build foundational GenAI knowledge before tackling Databricks-specific implementation.

Week 1Learn LLM fundamentals: what large language models are, how transformers work at a conceptual level, key parameters (temperature, max tokens, top-p), and the difference between base models and instruction-tuned models. Read Databricks' Generative AI Fundamentals documentation.
Week 2Study prompt engineering: system prompts, few-shot examples, chain-of-thought, output formatting instructions, and prompt injection risks. Practice writing prompts that produce consistently formatted JSON output using Foundation Model APIs on Databricks.
Week 3Build your first RAG pipeline: load a PDF, extract text, chunk it, embed chunks using a Databricks-hosted embedding model, store in Mosaic AI Vector Search, and retrieve top-k chunks for a query. Use Databricks notebooks in Community Edition.
Week 4Study LangChain on Databricks: LCEL pipe syntax, ChatPromptTemplate, retriever integration, and chain composition. Build a simple Q&A chain that augments prompts with Vector Search context. Log the chain to MLflow.
Week 5Deep dive into MLflow for GenAI: mlflow.langchain.log_model(), pyfunc models with pre/post-processing, model registration to Unity Catalog, and deploying a model serving endpoint. Understand AI Gateway and Inference Tables.
Week 6Study agentic systems: tool definitions, ReAct pattern, MLflow Agent Framework, and Agent Bricks (Knowledge Assistant, Multiagent Supervisor, Information Extraction). Understand when to use each Agent Bricks type.
Week 7Learn governance and guardrails: input/output guardrails, PII masking, data source licensing, Unity Catalog permissions for model serving endpoints, and selecting guardrail techniques.
Week 8Study evaluation and monitoring: mlflow.evaluate(), LLM-as-judge metrics (faithfulness, answer relevance), Inference Tables, Agent Monitoring, Usage Tables, and Databricks Scorers. Understand which metrics require ground truth.
Week 9Practice questions across all domains. Focus on Application Development (30% of exam) and Assembling and Deploying (22%). Review areas where you score below 70%.
Week 10Take full mock exams, review all incorrect answers, re-study weak areas. Aim for 80%+ before scheduling the real exam. The passing score is 70%.

Exam Overview

Format

45 multiple-choice questions, 90 minutes. Proctored through Kryterion Webassessor online or at testing centers.

Scoring

Percentage-based scoring. Passing: 70% (32 out of 45 questions). No penalty for wrong answers — always answer every question.

Domains & Weights

  • Design Applications14%
  • Data Preparation14%
  • Application Development30%
  • Assembling and Deploying Applications22%
  • Governance8%
  • Evaluation and Monitoring12%

Registration

$200 USD. Register at Databricks Academy (academy.databricks.com). Exam fee is $200 USD. Delivered through Kryterion Webassessor. Available as online proctored or at select testing centers.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowYou must understand these deeply, know when to use each, and be able to apply them in scenario questions. These appear across multiple exam domains.
Tier 2: Should KnowUnderstand what these are, their key characteristics, and when to use them. May appear in 2-4 questions each.
Tier 3: Recognize OnlyKnow what these are at a high level. Rarely more than 1-2 questions each.
Domain 114% of exam

Design Applications

This domain covers designing GenAI application architectures: crafting effective prompts, selecting appropriate models for business requirements, choosing chain components, converting business goals into AI pipeline specifications, and designing agentic systems with tools and Agent Bricks.

Key Topics

Prompt EngineeringLLM SelectionChain DesignAgent BricksTool DefinitionBusiness Requirements Translation

Must-Know Concepts

  • Craft prompts that yield specifically formatted responses: use explicit format instructions, JSON schema examples in the prompt, and output parsers to enforce structure
  • Select model tasks by matching LLM capabilities to business requirements: instruction following, classification, extraction, summarization, code generation, and multi-step reasoning
  • Chain components: retriever, prompt template, LLM, output parser, memory. Know what each does and how they connect in LCEL
  • Convert a business goal (e.g., 'answer employee HR questions') into an AI pipeline: identify inputs, required context sources, output format, and quality constraints
  • Define and sequence tools for multi-stage reasoning: each tool needs a clear name, description, and typed input/output schema so the LLM can decide when to use it
  • Determine Agent Bricks usage: Knowledge Assistant for RAG Q&A, Multiagent Supervisor for routing to specialized agents, Information Extraction for structured output from text

Common Exam Traps

Prompt formatting matters as much as content. If you instruct the LLM to output JSON but don't provide the schema, the format will be inconsistent. Always include a schema example in the prompt
Not every GenAI application needs an agent. Simple RAG Q&A applications are better served by a chain (predictable, testable) than an agent (non-deterministic, harder to debug)
Agent Bricks reduce development time for common patterns but are less flexible than building from scratch. If requirements don't fit a standard pattern, build with MLflow Agent Framework directly
Tool descriptions are LLM-readable documentation. Vague tool descriptions lead to incorrect tool selection by the agent. Write descriptions as if explaining to a junior engineer
Quick Check: Design Applications

Question 1 of 3

A business requires an LLM application that extracts contract terms (parties, dates, payment amounts) from uploaded legal documents and stores them in a structured database table. Which Agent Bricks type best fits this requirement?

Domain 214% of exam

Data Preparation

This domain covers preparing data for RAG applications: selecting appropriate chunking strategies, cleaning source documents, choosing extraction libraries, writing chunks to Delta Lake in Unity Catalog, identifying source documents, evaluating retrieval quality, and understanding re-ranking.

Key Topics

Chunking StrategiesDocument ExtractionDelta LakeUnity CatalogVector SearchRetrieval EvaluationRe-ranking

Must-Know Concepts

  • Chunking strategies and when to use each: fixed-size (uniform content), sentence/paragraph (preserves semantic units), recursive (hierarchy-aware for documents with headers/sections), semantic (topic-based clustering)
  • Chunk overlap: include overlapping tokens between adjacent chunks to avoid losing context at chunk boundaries. Typical: 10-20% of chunk size
  • Document content extraction libraries: PyPDF2/pdfplumber for PDFs, python-docx for Word, BeautifulSoup/html2text for HTML, unstructured for mixed types
  • Extraneous content to remove before chunking: headers, footers, page numbers, navigation menus, legal disclaimers, ads, formatting artifacts. These degrade retrieval relevance
  • Writing chunks to Delta Lake: store in a Unity Catalog table with columns for chunk_id, source_document, chunk_text, metadata. This table is the source for Delta Sync Vector Search index
  • Retrieval evaluation metrics: Precision@k (what fraction of retrieved chunks are relevant), Recall@k (what fraction of all relevant chunks are retrieved), MRR (Mean Reciprocal Rank of first relevant result)
  • Re-ranking: apply a cross-encoder model after initial vector search to re-score and reorder results. Improves precision at the cost of added latency
  • Source document identification: determine which documents are authoritative for the RAG application. Not all available data should be in the vector store

Common Exam Traps

Fixed-size chunking is simplest but may split mid-sentence or mid-concept. If retrieval quality is poor with fixed-size, switch to sentence or recursive chunking
Removing 'too much' content (e.g., removing section headings that provide context) can also degrade retrieval quality. Only remove truly extraneous content
Re-ranking improves precision (quality of top results) but adds latency. For real-time applications, evaluate whether the latency cost is acceptable
The Delta Lake table that feeds a Delta Sync Vector Search index must have specific columns (text column for embedding, ID column). Schema design matters
Quick Check: Data Preparation

Question 1 of 3

A RAG application is built over a large technical manual with chapters, sections, and subsections. Initial retrieval quality is low because retrieved chunks lack the context needed to answer questions. Which chunking strategy should be applied first?

Domain 330% of exam

Application Development

The heaviest domain at 30%. Covers LangChain and tool selection, response quality assessment, chunking strategy selection based on evaluation, prompt augmentation, guardrail implementation, LLM selection, embedding model selection, model hub usage, MLflow lifecycle, and agentic system development.

Key Topics

LangChainLCELMLflow AI GatewayLLM SelectionEmbedding ModelsGuardrailsAgent FrameworkGenie SpacesMulti-Agent Systems

Must-Know Concepts

  • LangChain LCEL: pipe syntax for composing chains (retriever | prompt | llm | output_parser). Each component is a Runnable. Know ChatPromptTemplate, retriever integration, and chain invocation
  • Assessing response quality qualitatively: hallucination, incomplete answers, incorrect tone, format violations, safety issues. Know how to identify these without automated metrics
  • Choosing chunking strategy based on model context length and retrieval evaluation results: if eval metrics are poor, iterate on chunking strategy
  • Augmenting prompts with context: inject retrieved chunks into the prompt template using {context} placeholder. Structure the prompt to clearly separate context from the user question
  • LLM guardrails: select techniques based on threat type — input classifiers for injection/harmful requests, output validators for PII/hallucination, topic classifiers for relevance
  • Selecting LLMs based on application attributes: task type (instruction following, code gen, multi-step reasoning), latency requirements, cost constraints, context window needs, and multilingual requirements
  • Embedding model context length: the embedding model must support a context length at least as long as the largest chunk. Chunks exceeding context length are truncated, degrading quality
  • Selecting models from hubs: use metadata filters (task, context length, benchmark scores, license, cost) to shortlist candidates
  • MLflow for GenAI lifecycle: log experiments, compare evaluation metrics across runs, register best model to Unity Catalog, track which prompt/model/data version produced which results
  • Agentic systems with MLflow Agent Framework: define tools, build the agent loop, trace executions, evaluate agent performance
  • Multi-agent systems: Genie Spaces for data access, conversational APIs for inter-agent communication, supervisor pattern for routing

Common Exam Traps

The embedding model must be the SAME model used during indexing and query time. Changing the embedding model requires re-embedding all chunks and rebuilding the index
LLM context length limits the total prompt size (system + retrieved chunks + user query). If retrieved chunks fill the context window, the LLM cannot process the full query
Guardrail selection must match the specific threat. A topic classifier prevents off-topic queries but does NOT protect against prompt injection. Use the right tool for each threat type
MLflow evaluation with LLM judges requires the judge LLM to have access to the Databricks workspace. Configure the judge endpoint correctly or evaluations will fail
Quick Check: Application Development

Question 1 of 3

A Databricks GenAI engineer is building a customer service bot that must never discuss competitor products. Which guardrail technique should be applied?

Domain 422% of exam

Assembling and Deploying Applications

This domain covers the technical implementation of deploying GenAI applications: coding pyfunc models, managing resource access, creating Vector Search indexes, registering models to Unity Catalog via MLflow, serving LLM applications, batch inference with ai_query(), CI/CD practices, MCP server integration, prompt lifecycle management, and building user interfaces.

Key Topics

pyfunc ModelsUnity CatalogMLflow Model RegistrationModel ServingVector Searchai_query()CI/CDMCP ServersDatabricks Apps

Must-Know Concepts

  • Code pyfunc models: implement a class extending mlflow.pyfunc.PythonModel with a predict(context, model_input) method. Use for chains with custom pre/post-processing
  • Resource access from serving endpoints: grant the endpoint's service principal permissions to Vector Search indexes, Delta tables, secrets, and external APIs
  • Coding simple chains: retriever + prompt + LLM using LCEL. Know the minimal implementation of a functional RAG chain
  • RAG application MLflow model elements: model flavor (langchain or pyfunc), embedding model reference, retriever configuration, Unity Catalog dependencies, input examples, and model signature
  • Registering models: mlflow.log_model() logs the artifact; mlflow.register_model() creates a Unity Catalog model version. Both steps required for deployment
  • Creating and querying Vector Search indexes: create index via SDK or UI, specify source Delta table (for Delta Sync) or schema (for Direct Access), query with similarity_search()
  • Serving LLM applications: deploy registered MLflow models to Databricks Model Serving. Configure compute type, concurrency, and environment variables
  • ai_query() syntax: SELECT ai_query('catalog.schema.endpoint', prompt_column) FROM source_table. Enables batch inference in SQL
  • Configuring Vector Search parameters: embedding model, index type (Delta Sync vs Direct Access), sync schedule/trigger, similarity metric (cosine vs dot product), and latency/cost trade-offs
  • CI/CD for GenAI: automate Vector Search index updates when source Delta table changes, promote tested prompts from dev to prod via version control, run component integration tests
  • MCP server types: Managed (Unity Catalog functions), External (third-party tool providers), Custom (user-implemented Python servers)
  • Prompt version control: track prompt versions as MLflow artifacts, use lifecycle stages (development, staging, production) for promotion

Common Exam Traps

mlflow.log_model() and mlflow.register_model() are two separate operations. Logging creates an artifact in the run. Registration creates a versioned model in Unity Catalog for deployment
The model serving endpoint runs with a service principal's identity. If the principal lacks permission to query Vector Search, the deployed RAG application will fail at inference time
ai_query() runs synchronously in SQL. For very large tables with expensive LLM calls, it can time out or incur high costs. Design batch jobs with cost and latency bounds in mind
When creating a Delta Sync Vector Search index, the source Delta table must already exist with the correct schema. Changing the schema later requires recreating the index
Quick Check: Assembling and Deploying Applications

Question 1 of 3

An engineer logs a LangChain RAG chain to MLflow and wants to deploy it as a REST endpoint. After running mlflow.langchain.log_model(), what is the next required step before deployment?

Domain 58% of exam

Governance

This domain covers governance of GenAI applications: applying guardrails for performance and safety objectives, selecting guardrail techniques against specific threats, addressing legal and licensing requirements for data sources, and recommending alternatives for problematic text in GenAI data.

Key Topics

Input GuardrailsOutput GuardrailsPII MaskingData LicensingUnity Catalog PermissionsContent Moderation

Must-Know Concepts

  • PII masking as a guardrail: detect and replace PII (names, emails, SSNs, phone numbers, addresses) in both inputs and outputs using NER models or regex patterns
  • Guardrail techniques for malicious inputs: input classifiers for harmful content, prompt injection detectors, topic scope enforcers, jailbreak detection
  • Legal and licensing requirements: understand Creative Commons licenses (CC-BY, CC-BY-SA, CC-BY-NC), copyright restrictions on training data and RAG source documents
  • Commercial restrictions: some licenses (CC-BY-NC) prohibit commercial use. Validate all RAG data sources for license compatibility before production deployment
  • Alternatives for problematic text: replace with filtered/curated datasets, use content policy flags to skip problematic documents, or apply post-processing to sanitize outputs
  • Unity Catalog permissions for GenAI: model serving endpoints run with service principal identity. Grant only necessary privileges following least-privilege principles

Common Exam Traps

PII masking applies to both inputs AND outputs. A user may include PII in their question, and the LLM may include PII from retrieved documents in its response. Both paths must be protected
License compliance is a legal requirement, not a technical nicety. Using CC-BY-NC data in a commercial product violates the license even if the data is technically accessible
Guardrail selection must match the specific threat vector. PII masking does NOT protect against prompt injection. Each threat requires the appropriate guardrail technique
Quick Check: Governance

Question 1 of 3

A healthcare company is building a RAG application over patient records. To comply with HIPAA, patient identifiers must never appear in LLM responses. Which guardrail approach should be implemented?

Domain 612% of exam

Evaluation and Monitoring

This domain covers evaluating and monitoring deployed GenAI applications: selecting LLMs using quantitative metrics, monitoring deployed endpoints, evaluating agents with MLflow, using inference logging, cost control, tracking with Agent Monitoring, identifying evaluation judges, using AI Gateway features, applying custom Scorers, and incorporating SME feedback.

Key Topics

mlflow.evaluate()LLM JudgesInference TablesAgent MonitoringAI GatewayUsage TablesDatabricks ScorersSME Feedback

Must-Know Concepts

  • mlflow.evaluate() API: pass model URI or function, evaluation dataset, targets column (for ground truth metrics), and list of evaluators/metrics. Results logged to the MLflow run
  • LLM-judge metrics (no ground truth needed): faithfulness, answer_relevance, harmfulness, coherence, fluency. Use a powerful LLM endpoint as the judge
  • Ground truth metrics (require labeled answers): answer_correctness, exact_match, ROUGE, BLEU. Need a curated test dataset with reference answers
  • MLflow Tracing for agents: automatically captures the full execution trace. Identify which tool calls failed, which reasoning steps were wrong, and where latency is spent
  • Inference Tables: auto-log every request and response at a model serving endpoint to a Delta table. Enable with one configuration setting on the endpoint
  • Agent Monitoring (Lakehouse Monitoring): analyze inference table data to track quality metrics, latency distributions, error rates, and drift over time
  • AI Gateway rate limiting: configure requests-per-minute or tokens-per-minute limits per endpoint or user to control costs and prevent abuse
  • Usage Tables: log token counts and cost estimates per request through AI Gateway. Join with Inference Tables for cost-quality analysis
  • Databricks Scorers: register Python functions as custom mlflow evaluators that score outputs on domain-specific criteria beyond built-in metrics
  • SME feedback: collect expert ratings via review apps, annotate correct/incorrect responses, use annotations to identify prompt weaknesses and update RAG content

Common Exam Traps

faithfulness requires BOTH the LLM response AND the retrieved context to score (measures if the answer is supported by context). Answer_relevance requires the question AND response. Know the inputs each metric needs
Enabling Inference Tables on an endpoint adds a small latency overhead and storage cost. This is generally acceptable but should be factored into production planning
Custom Scorers run within the MLflow evaluate() call. They need access to the same environment and endpoints as the evaluation. Network and permission issues can cause scorer failures
Agent Monitoring is a post-deployment feature. It analyzes historical inference data. It does NOT provide real-time alerting by default — configure alerts separately based on Lakehouse Monitoring data
Quick Check: Evaluation and Monitoring

Question 1 of 3

A team evaluates their RAG agent with mlflow.evaluate(). They want to measure whether responses are supported by the retrieved context. Which metric should they specify, and does it require ground truth?

Concepts You Must Not Confuse

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

Delta Sync Vector Search Index vs Direct Vector Access Vector Search Index

Use Delta Sync Vector Search Index when…

Auto-synced from a Delta table source. Databricks manages embedding refresh when source data changes. Best for RAG applications where source documents are stored in Delta Lake.

Use Direct Vector Access Vector Search Index when…

Managed by the application. You control when and what to upsert or delete. Best when you have custom embedding pipelines or need fine-grained control over index contents.

Exam trap

Delta Sync is simpler but requires a Delta table as the source. If your data comes from a non-Delta source or you need custom embedding logic, use Direct Vector Access. The exam tests selecting the right type for a given architecture.

Foundation Model APIs vs External Model Endpoints (AI Gateway)

Use Foundation Model APIs when…

Use Databricks-hosted models (DBRX, Llama, Mixtral) without managing infrastructure. Best for cost-efficiency and tight Databricks integration.

Use External Model Endpoints (AI Gateway) when…

Proxy to external LLMs (OpenAI GPT-4, Anthropic Claude) through AI Gateway. Use when a specific third-party model capability is required.

Exam trap

Both appear as model serving options in Databricks. External models go through AI Gateway, enabling unified governance (rate limiting, cost tracking, logging) even for third-party providers. Use external endpoints when the application specifically requires a third-party model.

RAG (Retrieval-Augmented Generation) vs Fine-Tuning

Use RAG (Retrieval-Augmented Generation) when…

Inject relevant documents into the LLM prompt at inference time. Best for keeping responses grounded in up-to-date, controlled data sources. No model weight changes.

Use Fine-Tuning when…

Update model weights using domain-specific training examples. Best for adapting model style, format, or domain-specific behavior that cannot be achieved with prompting.

Exam trap

RAG and fine-tuning solve different problems. RAG solves knowledge freshness and hallucination reduction. Fine-tuning solves style, format, and deep domain adaptation. They are complementary — a fine-tuned model can also use RAG.

LLM-as-Judge Metrics vs Ground Truth Metrics

Use LLM-as-Judge Metrics when…

A powerful LLM scores responses on dimensions like faithfulness, relevance, and harmfulness. Does NOT require labeled reference answers. Good for evaluating open-ended responses.

Use Ground Truth Metrics when…

Compare responses to reference answers using exact match, ROUGE, or BLEU. Requires a curated labeled dataset. More objective but limited to questions with known correct answers.

Exam trap

The exam tests which metrics require ground truth. Faithfulness and answer relevance are LLM-judge metrics (no ground truth needed). Answer correctness and exact match require ground truth labels. Know which category each metric belongs to.

Fixed-Size Chunking vs Semantic Chunking

Use Fixed-Size Chunking when…

Split text into chunks of exactly N tokens with optional overlap. Simple, predictable, fast. Best for uniform documents like transcripts or code.

Use Semantic Chunking when…

Split text at semantic boundaries by clustering similar sentences. Preserves meaning within chunks better. Best for heterogeneous documents with varied topic density.

Exam trap

Fixed-size chunking can split mid-sentence or mid-concept, causing retrieval quality issues. Semantic chunking is more expensive to compute but produces better retrieval quality. Choose based on document structure and quality requirements.

Input Guardrails vs Output Guardrails

Use Input Guardrails when…

Validate and filter user inputs BEFORE sending to the LLM. Detect and block: prompt injection attacks, PII in queries, harmful content requests, off-topic queries.

Use Output Guardrails when…

Validate LLM responses BEFORE returning to the user. Detect and handle: hallucinations, PII in outputs, policy violations, harmful content generated by the model.

Exam trap

Both types are needed in a complete guardrail system. Input guardrails prevent dangerous prompts from reaching the LLM. Output guardrails catch harmful responses that the LLM generates despite clean input. They protect against different threat vectors.

Inference Tables vs Usage Tables

Use Inference Tables when…

Log complete request and response payloads for a model serving endpoint. Used for quality monitoring, drift detection, and offline evaluation of deployed models.

Use Usage Tables when…

Log token consumption and cost metrics per request routed through AI Gateway. Used for cost attribution, budget tracking, and identifying expensive query patterns.

Exam trap

Inference Tables tell you WHAT the model said. Usage Tables tell you HOW MUCH it cost. They are complementary monitoring tools. Inference Tables are for quality; Usage Tables are for cost management.

Knowledge Assistant (Agent Bricks) vs Information Extraction (Agent Bricks)

Use Knowledge Assistant (Agent Bricks) when…

Pre-built agent for Q&A over documents using RAG. Returns natural language answers citing source chunks. Best for search and question-answering applications.

Use Information Extraction (Agent Bricks) when…

Pre-built agent for extracting structured data from unstructured text into a defined schema (Pydantic model). Best for parsing contracts, forms, or reports into structured database records.

Exam trap

Both use LLMs on unstructured text but for different purposes. Knowledge Assistant returns free-text answers for human consumption. Information Extraction returns structured JSON for programmatic use. Confusing them leads to wrong Agent Bricks type selection.

Top Mistakes to Avoid

Using the same embedding model at indexing time and query time is mandatory — switching models invalidates the entire vector index and requires full re-embedding
Confusing faithfulness (is the answer supported by context?) with answer relevance (does the answer address the question?) — they measure different quality dimensions
Forgetting that mlflow.log_model() and mlflow.register_model() are separate steps — logging alone does not make a model deployable as a serving endpoint
Treating RAG and fine-tuning as interchangeable — RAG grounds responses in current data at inference time; fine-tuning permanently modifies model weights for style/behavior
Selecting embedding model context length smaller than the largest chunk — truncated embeddings produce degraded retrieval quality
Not testing guardrails before production — guardrails that are too strict block legitimate queries; guardrails that are too lenient allow policy violations through
Choosing Delta Sync Vector Search when the source data is not in Delta Lake — Delta Sync requires a Delta table as the source; use Direct Vector Access for other pipelines
Confusing Knowledge Assistant with Information Extraction Agent Bricks — Knowledge Assistant returns natural language answers; Information Extraction returns structured JSON
Not configuring resource access for model serving endpoints — the endpoint's service principal needs explicit Unity Catalog permissions to query Vector Search and Delta tables
Using ai_query() for real-time interactive applications — ai_query() is optimized for batch SQL workloads, not low-latency interactive use cases

Exam-Ready Checklist

Can explain all 6 exam domains and their weights: Design (14%), Data Prep (14%), App Dev (30%), Assembling/Deploying (22%), Governance (8%), Evaluation/Monitoring (12%)
Understand the full RAG pipeline: document extraction → chunking → embedding → Vector Search indexing → retrieval → prompt augmentation → LLM → response
Know the two Vector Search index types (Delta Sync vs Direct Vector Access) and when to use each
Can code a simple LangChain LCEL RAG chain and explain each component's role
Understand MLflow GenAI lifecycle: log model → register to Unity Catalog → deploy to model serving endpoint
Know the three Agent Bricks types and the right use case for each
Can explain which evaluation metrics require ground truth (answer_correctness, exact_match, ROUGE) vs LLM-judge only (faithfulness, answer_relevance)
Understand Inference Tables vs Usage Tables — purpose, content, and when to use each
Know how to use ai_query() for batch inference and when it is appropriate vs real-time endpoints
Understand guardrail types (input vs output) and which technique addresses which threat (PII masking, topic classifier, injection detection, output validation)
Can explain prompt version control and CI/CD patterns for GenAI applications
Understand Unity Catalog permissions required for model serving endpoints and Vector Search
Know the trade-offs of chunking strategies, re-ranking, and hybrid search for retrieval quality
Scored 75%+ on at least two full practice exams (passing score is 70%). Aim for 80%+ for a comfortable margin on exam day

Recommended Resources

Free & Official Resources

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Frequently Asked Questions