CertPrepNow
CNCF / Linux FoundationCNPAUpdated 2026-06-08

CNPA Study Guide

Everything you need to pass the Certified Cloud Native Platform Engineering Associate exam. Structured study plans, key services, common traps, and practice questions.

You Can Pass This Exam For Free

The CNPA exam is passable with free resources if you have hands-on experience with Kubernetes and cloud native tooling and study consistently for 6-8 weeks:

  • CNCF CNPA official exam curriculum and candidate handbook (free on Linux Foundation website)
  • CNCF landscape documentation and project READMEs (free on cncf.io)
  • Kubernetes official documentation at kubernetes.io (free, comprehensive)
  • Platform Engineering community resources at platformengineering.org (free)
  • Argo CD, Flux, Crossplane, Backstage official docs (free)
  • CNCF YouTube channel — KubeCon presentations on platform engineering topics (free)
  • Killercoda and Play with Kubernetes interactive browser-based labs (free tier)
  • Free practice questions on this site

The CNPA is knowledge-based with multiple-choice questions. Hands-on Kubernetes experience and familiarity with CNCF ecosystem projects are strongly recommended. Free official documentation and community content cover all exam domains thoroughly.

Choose Your Study Path

Limited Kubernetes or cloud native experience. You need to build foundational knowledge of cloud native concepts, Kubernetes, and the CNCF ecosystem before tackling platform engineering specifics.

Week 1Learn Kubernetes fundamentals: pods, deployments, services, namespaces, configmaps, and secrets. Use the official kubernetes.io docs and Killercoda labs. Understand the declarative resource model — what it means to define desired state in YAML.
Week 2Study DevOps principles and how they map to platform engineering. Understand the difference between platform engineering and traditional ops. Read the CNCF Platforms white paper (free download). Learn what an Internal Developer Platform (IDP) is and why organizations build them.
Week 3Dive into Continuous Integration: understand CI pipelines, what they automate (build, test, lint, scan), and popular CI tools in the CNCF ecosystem (Tekton, Jenkins X, GitHub Actions). Learn the pipeline stages and their purpose in a platform context.
Week 4Study Continuous Delivery and GitOps basics. Understand the GitOps principles: Git as the single source of truth, declarative configuration, reconciliation. Learn Argo CD and Flux conceptually — what they do, push vs pull deployment models, and how they enforce desired state.
Week 5Cover observability fundamentals: the three pillars (metrics, logs, traces). Learn Prometheus for metrics, Grafana for dashboards, Loki for logs, Jaeger/Tempo for distributed tracing. Understand what SLIs, SLOs, and SLAs mean for platform reliability.
Week 6Study Kubernetes security basics: RBAC, network policies, pod security standards, secrets management. Learn about admission controllers and policy engines (OPA/Gatekeeper, Kyverno). Understand mTLS and service mesh concepts (Istio, Linkerd).
Week 7Learn platform APIs and infrastructure provisioning: Custom Resource Definitions (CRDs), the Kubernetes reconciliation loop, and the operator pattern. Study Crossplane for infrastructure provisioning from Kubernetes. Understand what infrastructure as code means in a cloud native context.
Week 8Study IDPs and developer experience: what service catalogs are (Backstage), how developer portals abstract platform complexity, and golden paths. Learn about DORA metrics (deployment frequency, lead time, MTTR, change failure rate) and how they measure platform effectiveness.
Week 9Practice questions across all domains. Focus on Platform Engineering Core Fundamentals (36% of exam) — this single domain is more than a third of the exam. Review GitOps, CI/CD, and observability concepts.
Week 10Take full mock exams targeting 75%+. Review all incorrect answers. Re-study any domains where you score below 70%. The passing score is 75%, so aim for 85%+ in practice to have a comfortable margin.

Exam Overview

Format

60 multiple-choice questions, 120 minutes. Online proctored exam delivered through PSI. Closed-book with no external resources allowed.

Scoring

Percentage-based scoring. Passing: 75%. No penalty for wrong answers — always answer every question. Score report provided immediately after exam.

Domains & Weights

  • Platform Engineering Core Fundamentals36%
  • Platform Observability, Security, and Conformance20%
  • Continuous Delivery and Platform Engineering16%
  • Platform APIs and Provisioning Infrastructure12%
  • IDPs and Developer Experience8%
  • Measuring your Platform8%

Registration

$250 USD. Register at training.linuxfoundation.org. Exam fee is $250 USD. Certification is valid for 2 years and includes one free retake. A Linux Foundation account is required.

Topic Priority Table

Not all topics are tested equally. Focus your study time on Tier 1 first, then Tier 2. Tier 3 topics rarely appear — just recognize what they do.

Tier 1: Must KnowYou must understand these concepts deeply, know how they work, and apply them in scenario-based questions. These appear across multiple questions and multiple domains.
Tier 2: Should KnowUnderstand what these are, their key characteristics, and how they fit into a cloud native platform. May appear in 2-5 questions each.
Tier 3: Recognize OnlyKnow what these are at a high level and their role in the cloud native platform ecosystem. Rarely more than 1-2 questions each.
Domain 136% of exam

Platform Engineering Core Fundamentals

The largest domain at 36% of the exam. Covers the foundational concepts: declarative resource management, DevOps practices, application environments, platform architecture, and the full CI/CD and GitOps lifecycle. This is the theoretical foundation that all other domains build on.

Key Topics

Declarative ConfigurationKubernetesGitOpsCI PipelinesCD PipelinesDevOps CulturePlatform Architecture

Must-Know Concepts

  • Declarative vs imperative resource management: declarative defines desired state in files; imperative issues commands. Kubernetes is declarative — you apply YAML manifests
  • The four OpenGitOps principles: declarative, versioned and immutable, pulled automatically, and continuously reconciled
  • DevOps as both cultural and technical practices: breaking silos between dev and ops, shared responsibility for reliability, fast feedback loops
  • Platform engineering as the discipline of building self-service platforms (IDPs) that enable developers to be productive without deep infrastructure knowledge
  • Application environments: development, staging/pre-production, and production environments and the promotion patterns between them
  • Continuous Integration: automated build, test, lint, scan, and artifact creation on every commit. Fast feedback for developers
  • Continuous Delivery: automated deployment pipeline that can deploy to production at any time. Every commit should be deployable
  • Continuous Deployment: every commit that passes CI is automatically deployed to production without manual approval (a subset of organizations practice this)
  • GitOps workflow: developer commits to Git > CI pipeline builds and tests > GitOps operator detects change > reconciles cluster to new desired state
  • Push vs pull deployment models and why pull-based (GitOps) provides better security, auditability, and drift prevention

Common Exam Traps

Continuous Delivery means you CAN deploy at any time; Continuous Deployment means you DO deploy automatically on every passing commit. Many organizations practice Delivery but not Deployment
GitOps is NOT a tool — it is a set of practices. Argo CD and Flux are tools that implement GitOps
Declarative management means the SYSTEM handles reconciliation. You do not script the steps — you describe the end state
Platform engineering is NOT the same as DevOps or SRE, though it borrows from both. Platform engineers build PLATFORMS for other teams to use, not operational runbooks or services directly
Quick Check: Platform Engineering Core Fundamentals

Question 1 of 3

A platform team is designing a deployment workflow where developers commit application changes to a Git repository and the changes are automatically reflected in the Kubernetes cluster. The cluster continuously checks for drift and corrects it. Which deployment model does this describe?

Domain 220% of exam

Platform Observability, Security, and Conformance

The second-largest domain at 20%. Covers the three pillars of observability (metrics, logs, traces), SLI/SLO/SLA frameworks, Kubernetes security (RBAC, pod security, network policies), mTLS and service meshes, and policy engines for conformance enforcement.

Key Topics

PrometheusGrafanaLokiJaegerKubernetes RBACNetwork PoliciesmTLSService MeshOPA/GatekeeperKyverno

Must-Know Concepts

  • Three pillars of observability: metrics (quantitative measurements), logs (discrete events), and traces (distributed request paths)
  • SLI: the specific metric measured (e.g., request latency p99). SLO: the reliability target for that SLI (e.g., p99 < 200ms, 99.9% of the time). SLA: the contractual commitment, typically looser than the SLO
  • Error budget: the acceptable amount of unreliability implied by an SLO. If SLO is 99.9%, the error budget is 0.1% — the budget guides release velocity decisions
  • Prometheus: pull-based metrics collection, PromQL for queries, Alertmanager for routing alerts, PodMonitor/ServiceMonitor CRDs for Kubernetes scrape configuration
  • Kubernetes RBAC: Roles and ClusterRoles define permissions, RoleBindings and ClusterRoleBindings assign them to subjects (users, groups, service accounts). Principle of least privilege
  • Network Policies: Kubernetes objects that define allowed ingress and egress traffic for pods. By default, all traffic is allowed; network policies add restrictions
  • Pod Security Standards: Privileged, Baseline, and Restricted pod security profiles enforced via Pod Security Admission controller in modern Kubernetes
  • mTLS (mutual TLS): both client and server authenticate each other and encrypt communication. Service meshes (Istio, Linkerd) implement mTLS transparently for all service-to-service communication
  • Admission webhooks: Kubernetes API intercept points that validate or mutate resources before persistence. Mutating webhooks run before validating webhooks
  • Policy engines: OPA/Gatekeeper (Rego-based) and Kyverno (YAML-based) enforce organizational policies via admission webhooks. Both support audit and enforce modes

Common Exam Traps

Network Policies are DEFAULT ALLOW — if no NetworkPolicy selects a pod, all traffic is allowed. Adding a NetworkPolicy creates restrictions, it does not add permissions
SLO should be STRICTER than SLA. Your SLO is your internal target; your SLA is the external commitment. Violate your SLO first so you can fix it before violating the SLA
Pod Security Admission replaced PodSecurityPolicy (PSP) in Kubernetes 1.25. PSP is deprecated — do not reference it in modern platform engineering answers
mTLS in a service mesh is transparent to application code — applications do NOT implement TLS themselves. The sidecar proxy (Envoy in Istio) handles TLS termination and origination
OPA/Gatekeeper audit mode reports on policy violations for EXISTING resources; enforce mode blocks NEW or UPDATED resources. Existing violations are not automatically deleted in enforce mode
Quick Check: Platform Observability, Security, and Conformance

Question 1 of 3

A platform team has set an SLO of 99.9% availability for their API gateway. Their SLA with enterprise customers guarantees 99.5% availability. An incident reduces availability to 99.7% for a month. What is the impact?

Domain 316% of exam

Continuous Delivery and Platform Engineering

This domain covers CI pipeline architecture, advanced GitOps workflows, incident response practices, and how continuous delivery integrates with platform engineering. Expect questions on pipeline stages, deployment strategies, and GitOps branching models.

Key Topics

CI PipelinesGitOps WorkflowsArgo CDFluxDeployment StrategiesIncident ResponseCanary / Blue-Green Deployments

Must-Know Concepts

  • CI pipeline stages in cloud native: code commit > trigger > checkout > lint > unit test > build container image > scan image for vulnerabilities > push to registry > update GitOps manifests
  • Deployment strategies: rolling update (gradual pod replacement), blue-green (two identical environments, switch traffic), canary (route small % traffic to new version, gradually increase)
  • GitOps branching models: environment-per-branch (main=prod, staging branch, dev branch) vs directory-based (one branch, environments in subdirectories)
  • Progressive delivery: combining canary deployments with automated analysis to automatically promote or rollback based on metrics (Argo Rollouts supports this)
  • Incident response in platform engineering: alert triggers > on-call paged > triage (impact, scope) > communicate status > mitigate (rollback, scale, fix) > post-incident review > action items
  • GitOps-based incident response: rollback by reverting the Git commit — the GitOps operator automatically restores the previous state
  • Supply chain security in CI: image signing (cosign/Sigstore), SBOM generation, vulnerability scanning with Trivy or Grype, SLSA levels for build provenance
  • Separation of concerns: CI is responsible for producing a validated artifact (container image); CD/GitOps is responsible for deploying it. They should not overlap

Common Exam Traps

Canary deployments route a small PERCENTAGE of traffic to the new version — not a separate environment. Blue-green is two full environments with traffic switching. Canary uses real production traffic incrementally
Rolling update is the Kubernetes default — pods are gradually replaced. It does NOT provide instant rollback like blue-green. Rolling back a rolling update takes time
GitOps rollback = reverting the Git commit. The operator detects the Git revert and reconciles the cluster to the previous desired state. This is faster and more reliable than running kubectl commands
CI pipelines should build the image ONCE and promote that exact image through environments. Never rebuild the image for each environment — rebuild breaks the immutability guarantee
Vulnerability scanning in CI should be a GATE — the pipeline should fail if critical vulnerabilities are found, not just report them
Quick Check: Continuous Delivery and Platform Engineering

Question 1 of 3

A platform team wants to deploy a new API version to production while minimizing risk. They want to route 5% of production traffic to the new version, monitor error rates, and gradually increase traffic if metrics look healthy. Which deployment strategy should they use?

Domain 412% of exam

Platform APIs and Provisioning Infrastructure

This domain covers how platforms expose APIs through Kubernetes extension mechanisms — CRDs, custom controllers, operators — and how infrastructure is provisioned through cloud native tooling like Crossplane and Terraform. The focus is on infrastructure-as-code in a Kubernetes-native context.

Key Topics

Custom Resource DefinitionsKubernetes OperatorsCrossplaneTerraformKubernetes API MachineryReconciliation Loop

Must-Know Concepts

  • CRD lifecycle: define schema in YAML > apply to cluster > Kubernetes API server accepts instances > custom controller watches and acts on instances
  • The operator pattern: a software extension that uses CRDs and a custom controller to manage the complete lifecycle of a complex application (install, configure, backup, upgrade, recover)
  • Reconciliation loop: watch for changes > observe current state > compare to desired state > take actions to reconcile > repeat continuously
  • Infrastructure provisioning approaches: declarative (Terraform, Crossplane) vs imperative (scripts, manual). Platform teams should use declarative for reproducibility
  • Crossplane architecture: providers (connect to cloud APIs), managed resources (map to cloud resources like RDS instances), composite resources (abstract multiple managed resources), and compositions (templates for composite resources)
  • The difference between infrastructure provisioning (creating the resource) and infrastructure configuration (managing settings after creation) — both should be handled declaratively
  • Kubernetes API groups and versioning: how API resources are organized (apiVersion: apps/v1, batch/v1, etc.) and why versioning matters for CRD stability
  • Webhook admission patterns: how mutating and validating webhooks extend the Kubernetes API admission chain for platform governance

Common Exam Traps

A CRD defines the schema; a custom controller provides the behavior. Without a controller, CRD instances are just stored — nothing happens to them
Crossplane composite resources abstract underlying cloud resources from developers. A developer creates a DatabaseClaim; Crossplane creates the actual RDS instance. The abstraction is the key design principle
Operators are for complex STATEFUL applications that have operational knowledge built in (backup, failover, upgrades). For simple stateless apps, Deployments and Helm charts are sufficient — operators are overkill
Terraform is NOT Kubernetes-native. It uses its own state file and CLI. Crossplane runs inside Kubernetes and uses the reconciliation loop — no external state file needed for Kubernetes-managed resources
When a custom controller crashes or is restarted, it should be able to RECONCILE from current state in the cluster — it must be idempotent. Reconciliation must be safe to run repeatedly
Quick Check: Platform APIs and Provisioning Infrastructure

Question 1 of 3

A platform team wants developers to request a managed PostgreSQL database by creating a YAML manifest in Kubernetes, without knowing the underlying cloud provider details. What is the cloud native approach?

Domain 58% of exam

IDPs and Developer Experience

This domain covers Internal Developer Platforms — how platform teams build self-service environments that improve developer productivity. Topics include service catalogs, developer portals (Backstage), golden paths, and how AI/ML automation is emerging in platform tooling.

Key Topics

BackstageService CatalogsGolden PathsSoftware TemplatesDeveloper PortalsSelf-Service Infrastructure

Must-Know Concepts

  • Platform engineering goal: reduce cognitive load on developers by providing self-service, paved roads, and golden paths for common tasks
  • Golden paths: the recommended, pre-built, opinionated workflows that platform teams provide for common developer tasks (create a service, provision a database, set up CI). Golden paths, not golden cages — developers can deviate but it costs more
  • Backstage core components: Software Catalog (register and discover all services, APIs, teams), Software Templates (scaffolding for new services), TechDocs (documentation as code), and the Plugin ecosystem
  • Service catalog: a registry of all software components, services, APIs, and resources in the organization. Enables discoverability, ownership tracking, and dependency mapping
  • Developer portals vs IDPs: the portal (Backstage) is the UI layer; the IDP is the entire platform. The portal surfaces capabilities the IDP provides
  • Self-service provisioning: developers should be able to create standard resources (namespaces, databases, CI pipelines) without filing tickets or waiting for ops teams
  • AI/ML in platform tooling: AI-assisted code review, AI-generated runbooks, ML-based anomaly detection in observability, and LLM-powered developer assistants integrated into portals

Common Exam Traps

Backstage is a framework, not a finished product. Organizations must install, configure, and maintain it. It requires ongoing investment — it is not plug-and-play
Golden paths should be maintained as first-class products, not one-time setups. If the golden path becomes outdated, developers bypass it and the productivity benefit is lost
Service catalogs without ownership data are incomplete. The catalog is most valuable when it shows WHO owns each service — essential for incident response and change management
Developer portals are for DEVELOPERS — they should reduce friction, not add it. If onboarding to the portal is harder than the manual process, adoption will be zero
Quick Check: IDPs and Developer Experience

Question 1 of 3

A platform team has built golden paths for creating new microservices. A developer needs to create a new service type not covered by any existing golden path. What should the developer do?

Domain 68% of exam

Measuring your Platform

The smallest domain at 8%. Covers how platform engineering teams measure the effectiveness and efficiency of their platforms using DORA metrics, developer experience metrics, and platform-specific KPIs. Understanding what to measure and why is key.

Key Topics

DORA MetricsPlatform KPIsDevEx MetricsError BudgetsAdoption Metrics

Must-Know Concepts

  • DORA four key metrics: Deployment Frequency, Lead Time for Changes, Change Failure Rate, and Mean Time to Restore (MTTR)
  • DORA performance buckets: Elite performers have deployment frequency multiple times per day, lead time under one hour, change failure rate < 5%, MTTR under one hour
  • Deployment Frequency: how often code is deployed to production. Higher frequency (with stability) indicates mature delivery pipeline
  • Lead Time for Changes: time from code commit to running in production. Measures the efficiency of the entire delivery pipeline
  • Change Failure Rate: percentage of deployments that cause an incident or require rollback. Measures deployment quality
  • MTTR (Mean Time to Restore): average time to restore service after a production incident. Measures platform resilience and incident response effectiveness
  • Platform adoption metrics: golden path adoption rate, developer portal active users, self-service request volume, ticket volume reduction
  • Developer experience (DevEx) metrics: developer satisfaction scores (surveys), onboarding time for new developers, time to first deployment, cognitive load measures
  • Relationship between DORA metrics: teams should aim to improve ALL four simultaneously. High deployment frequency with high change failure rate is not mature — stability matters as much as speed

Common Exam Traps

MTTR is Mean Time to RESTORE — not Mean Time to REPAIR or RESOLVE (though these terms are sometimes used interchangeably in practice). The exam uses 'restore' as recovering service, not fixing root cause
A high deployment frequency with a high change failure rate is WORSE than moderate frequency with low failure rate. DORA metrics must improve together, not independently
Change Failure Rate measures deployments that CAUSE incidents — it does not measure all failures. A deployment that succeeds but later causes a subtle performance issue may not be captured immediately
Error budgets are derived from SLOs — they represent how much unreliability you can afford. Exhausting the error budget signals it is time to prioritize reliability work over new features
Measuring platform adoption is as important as technical metrics. A technically excellent platform with zero developer adoption provides no business value
Quick Check: Measuring your Platform

Question 1 of 3

A platform team deploys to production 15 times per day but 30% of those deployments cause incidents requiring rollback. Using DORA metrics, how should this be characterized?

Concepts You Must Not Confuse

These pairs appear on nearly every exam. Learn the difference and you'll avoid the most common traps.

GitOps (Pull-based CD) vs Traditional Push-based CD

Use GitOps (Pull-based CD) when…

The cluster agent (Argo CD, Flux) continuously pulls from Git and reconciles cluster state. Git is the single source of truth. Drift is automatically detected and corrected.

Use Traditional Push-based CD when…

CI/CD pipeline pushes deployments directly to the cluster using kubectl or Helm commands. Cluster state can drift from what was deployed without detection.

Exam trap

GitOps is PULL-based — the cluster pulls from Git. Push-based CD pipelines push to the cluster. GitOps provides continuous reconciliation and drift correction that push models lack.

Argo CD vs Flux

Use Argo CD when…

Single application with a rich UI for managing GitOps deployments. App-of-apps pattern for managing multiple applications. Strong multi-tenancy through Projects.

Use Flux when…

Modular GitOps Toolkit approach with composable controllers. Stronger Helm and Kustomize native integration. More flexible multi-cluster bootstrap approach.

Exam trap

Both Argo CD and Flux implement the same GitOps principles (pull-based, Git as source of truth, continuous reconciliation). Choose based on team preference and tooling ecosystem, not principles — the principles are identical.

OPA/Gatekeeper vs Kyverno

Use OPA/Gatekeeper when…

Uses Rego policy language — very powerful and general-purpose. Can enforce complex multi-resource policies. Steeper learning curve. OPA can also be used outside Kubernetes.

Use Kyverno when…

Uses Kubernetes-native YAML policies — easier learning curve for Kubernetes operators. Native support for generating resources and mutating resources alongside validation.

Exam trap

Both are Kubernetes admission webhook policy engines. OPA/Gatekeeper uses Rego (a dedicated policy language), while Kyverno uses YAML (familiar to Kubernetes users). Kyverno can also GENERATE and MUTATE resources, not just validate.

SLO (Service Level Objective) vs SLA (Service Level Agreement)

Use SLO (Service Level Objective) when…

Internal reliability target set by the engineering team. A goal, not a contract. Defines the acceptable reliability threshold and drives error budget calculations. Violation has no contractual consequences.

Use SLA (Service Level Agreement) when…

External contractual commitment with customers or stakeholders. Violation results in penalties (credits, refunds). SLAs are typically set more conservatively than internal SLOs.

Exam trap

SLOs are INTERNAL targets that should be stricter than your SLAs. If your SLA is 99.9% uptime, your SLO should be 99.95% so that SLO violations trigger internal action before the SLA is breached.

Custom Resource Definition (CRD) vs Kubernetes Operator

Use Custom Resource Definition (CRD) when…

Defines the schema for a new Kubernetes resource type. Tells Kubernetes what fields the new resource accepts. Provides no behavior on its own.

Use Kubernetes Operator when…

A controller that watches a specific CRD and implements the business logic to manage the lifecycle of the custom resource. The operator gives behavior to the CRD.

Exam trap

A CRD without an operator is just a data store — Kubernetes accepts the resource but nothing acts on it. The operator watches for CRD instances and reconciles them to the desired state. You need BOTH the CRD and the operator together.

Metrics (Prometheus) vs Traces (Jaeger/Tempo)

Use Metrics (Prometheus) when…

Aggregated numerical measurements over time. Answer questions like 'what is the error rate?' and 'how many requests per second?'. Efficient storage via time-series. Best for alerting and dashboards.

Use Traces (Jaeger/Tempo) when…

Distributed end-to-end request paths through microservices. Answer questions like 'which service is causing latency?' and 'where did this request fail?'. Essential for debugging distributed systems.

Exam trap

Metrics tell you THAT something is wrong (error rate spiked). Traces tell you WHERE and WHY (which microservice in the call chain failed). Both are needed for effective platform observability — metrics for alerting, traces for root cause analysis.

IDP (Internal Developer Platform) vs Developer Portal (Backstage)

Use IDP (Internal Developer Platform) when…

The entire self-service platform including all infrastructure, services, pipelines, and tooling that developers use to build and deploy applications. Encompasses multiple systems and tools.

Use Developer Portal (Backstage) when…

The UI layer of an IDP — a centralized web interface where developers discover services, create projects, view documentation, and interact with the platform. Backstage is the most common framework.

Exam trap

A developer portal is ONE component of an IDP — the front-end interface. The IDP itself includes the CI/CD system, secret management, monitoring, and all platform services. Backstage provides the portal UI; it does not replace the entire platform.

Mutating Admission Webhook vs Validating Admission Webhook

Use Mutating Admission Webhook when…

Intercepts API requests and can MODIFY the resource before it is persisted (e.g., inject sidecar containers, set default values, add labels). Runs BEFORE validating webhooks.

Use Validating Admission Webhook when…

Intercepts API requests and can ALLOW or DENY them based on policy, but CANNOT modify the resource. Runs AFTER mutating webhooks.

Exam trap

Execution order matters: Mutating webhooks run FIRST, then Validating webhooks. Kyverno registers BOTH types of webhooks (MutatingWebhookConfiguration for mutate rules, ValidatingWebhookConfiguration for validate and generate rules). OPA/Gatekeeper primarily validates but also supports mutation via assign/assignMetadata. A validating webhook sees the resource AFTER any mutations have been applied.

Top Mistakes to Avoid

Confusing GitOps (pull-based, cluster pulls from Git) with traditional push-based CI/CD (pipeline pushes to cluster) — the pull model and drift correction are what define GitOps
Treating CRDs and operators as the same thing — a CRD defines the schema and a controller/operator implements the behavior; you need both
Mixing up SLO and SLA — SLO is the internal target (stricter), SLA is the external commitment (looser). Violate your SLO first so you fix issues before breaching the SLA
Thinking DORA metrics only measure speed — change failure rate and MTTR measure stability. Elite performers are fast AND stable simultaneously
Confusing Backstage (a developer portal framework) with an IDP (the entire self-service platform) — Backstage is one UI layer component of a broader IDP
Assuming network policies add permissions — Kubernetes defaults to allow-all; network policies add RESTRICTIONS. Default behavior without any network policy is fully permissive
Forgetting that OPA/Gatekeeper and Kyverno policies in enforce mode only affect NEW or UPDATED resources — existing violations require audit mode to surface and separate cleanup
Thinking continuous deployment and continuous delivery are the same — delivery means you CAN deploy manually; deployment means you DO deploy automatically on every passing commit
Confusing canary (traffic percentage splitting) with blue-green (two environments, instant traffic switch) — they are different risk management strategies with different rollback characteristics
Assuming Crossplane replaces Terraform — Crossplane is Kubernetes-native for cloud resource management via the reconciliation loop; Terraform is a standalone IaC tool. Both have valid use cases

Exam-Ready Checklist

Can explain all 6 exam domains and their relative weights: 36%, 20%, 16%, 12%, 8%, 8%
Know the four OpenGitOps principles and the difference between push-based and pull-based deployment models
Can explain declarative resource management vs imperative and why declarative is the cloud native standard
Understand the Kubernetes reconciliation loop and how it applies to GitOps, operators, and Crossplane
Know all three observability pillars (metrics, logs, traces), the tools for each (Prometheus, Loki, Jaeger), and the SLI/SLO/SLA/error budget framework
Can explain Kubernetes RBAC, network policies, pod security standards, and when to use each
Understand mTLS in service meshes and why it provides authentication + encryption without code changes
Know OPA/Gatekeeper vs Kyverno — policy language, modes (audit/enforce), and when each runs in the admission chain
Can explain CRDs + operators and why both are needed (schema vs behavior)
Understand the IDP concept, what Backstage provides (Software Catalog, Templates, TechDocs, Plugins), and what golden paths mean
Know all four DORA metrics by name, what each measures, and the elite performer benchmarks
Understand deployment strategies: rolling update, blue-green, canary — when to use each and rollback characteristics
Can explain Crossplane's role in cloud native infrastructure provisioning and how it differs from Terraform
Scored 75%+ on at least two full mock exams (the passing score is 75%). Aim for 85%+ for a comfortable margin

Recommended Resources

Free & Official Resources

Paid Courses & Practice Exams

These are recommended if you prefer a structured learning path. They can save time but are not required to pass.

Frequently Asked Questions