ISACAAAIA3 domains

AAIA Exam Notes

Last-minute traps, must-know facts, and scenario tips for the ISACA Advanced in AI Audit exam.

General Exam Tips

1.Allocate time proportional to domain weight: spend ~46% of your study time on Domain 2 (AI Operations), ~33% on Domain 1, and ~21% on Domain 3 — most candidates who fail under-prepare for Domain 2.
2.Budget exactly 1.6 minutes per question. Flag hard questions immediately and move on — return to them in the last 15 minutes. Never agonize at question 12 and run out of time on question 85.
3.No penalty for wrong answers. Guess on every unanswered question before time expires — a blank is a guaranteed zero, a guess has positive expected value.
4.Every scenario question is asking one of four things: what is the risk, what control is missing, what audit procedure applies, or what evidence is needed. Frame your reading around those four lenses.
5.When two answers both sound correct, choose the one that focuses on the AUDIT perspective — the examiner wants to know what an auditor does, not what a developer, risk manager, or data scientist does.
6.Read all four options before committing. The exam is designed so that distractor answers describe the right action for the wrong role or the right framework in the wrong context.
7.ISACA exam language: 'MOST appropriate,' 'BEST course of action,' and 'PRIMARILY' mean pick the one answer that is most directly relevant — eliminate answers that are correct but secondary.
8.Target 65-70% on mock exams to build a safety margin over the 450/800 passing score. If you are consistently scoring below 60%, add one more week of Domain 2 focus before scheduling.
9.The exam is computer-based — you can mark questions for review. Use that feature aggressively on scenario questions that require multiple reads.
10.Candidates from traditional IT audit backgrounds consistently report that Domain 2 questions feel unfamiliar. If you encounter an AI operations scenario you cannot confidently answer, eliminate framework-based answers and choose the one that most closely mirrors traditional IT change management or incident response adapted for AI.

Quick Navigation

AI Governance and Risk AI Operations AI Auditing Tools and Techniques

Domain 133% of exam

AI Governance and Risk

Must-Know Facts

NIST AI RMF four core functions in order: Govern, Map, Measure, Manage. 'Govern' sets the organizational foundation; the others operate within that structure.
EU AI Act four risk tiers: Unacceptable (banned outright), High (mandatory conformity assessment + Notified Body review for some), Limited (transparency obligations), Minimal (no obligations). High-risk AI is controlled, not banned.
ISO/IEC 42001 is the ONLY AI framework that results in a formal organizational certification after third-party audit. NIST AI RMF and EU AI Act are not certification schemes.
Data provenance = where data came from (origin, consent, ownership). Data lineage = how data was transformed (steps, modifications, processing history). The exam tests both in audit scenarios.
AI RACI structure: Model Owner is accountable for model governance decisions. Data Scientist builds and maintains the model. Risk Management assesses and monitors AI risk. Internal Audit independently assesses all of the above. These cannot overlap without a separation of duties violation.
AI Ethics Board advises on ethical acceptability of AI initiatives — it does NOT approve models for production. Model approval is a governance/risk function separate from ethics advisory.
Sanctioned vs. unsanctioned AI: employees using unapproved AI tools (shadow AI) is a policy compliance risk that auditors must specifically test for — this is an AI governance control gap.
Workforce impact assessment is required as part of AI governance — auditors must verify the organization assessed how AI affects job roles, training needs, and organizational change readiness.
Third-party AI risk requires right-to-audit clauses in vendor contracts. Without them, the auditor cannot verify vendor AI controls. Absence of a right-to-audit clause is a reportable finding.
Privacy Impact Assessments (PIAs) are required for AI systems that process personal data — especially relevant when AI makes automated decisions about individuals.

Common Traps

TrapTreating all three major AI frameworks as equivalent in terms of legal enforceability

RealityEU AI Act is binding law with financial penalties. ISO/IEC 42001 is a voluntary certifiable standard. NIST AI RMF is voluntary guidance with no enforcement mechanism. An auditor must apply mandatory compliance testing for EU AI Act in EU contexts, while NIST and ISO compliance is discretionary unless contractually required.

TrapThinking high-risk AI under the EU AI Act is banned or prohibited

RealityHigh-risk AI is PERMITTED but requires conformity assessment, registration in the EU database, and ongoing monitoring. Only unacceptable-risk AI is banned (e.g., social scoring by governments, real-time biometric surveillance in public spaces with narrow exceptions). An auditor who recommends discontinuing a high-risk AI system simply because it is 'high-risk' has misunderstood the Act.

TrapConfusing the AI Ethics Board role with model governance or approval authority

RealityThe AI Ethics Board advises on ethical principles and acceptable AI use cases. It does not approve individual models for production. Model approvals belong to the governance/risk function. An auditor looking for a model approval control should look at the change management process, not the ethics board minutes.

TrapAssuming data governance for AI is the same as general data governance

RealityAI introduces unique data governance requirements: consent must cover the specific use of data for model training (not just data storage), data lineage must trace through feature engineering transformations, and training data must be retained long enough to support model auditability and retraining decisions.

TrapTreating AI risk assessment as identical to traditional IT risk assessment

RealityAI introduces risks with no direct IT equivalent: algorithmic bias, model drift, hallucination, and adversarial manipulation. A risk assessment that uses only traditional IT risk categories (availability, confidentiality, integrity) will miss these AI-specific risks entirely.

TrapThinking OECD AI Principles or UNESCO recommendations impose compliance obligations

RealityNeither OECD nor UNESCO AI documents are legally binding. They are international guidelines only. The EU AI Act is the only major AI regulation with direct legal enforcement currently in scope for the exam. Do not select OECD or UNESCO as the answer to 'which framework is legally required.'

Confusing Pairs

NIST AI RMFISO/IEC 42001

NIST AI RMF = voluntary guidelines with four functions (Govern, Map, Measure, Manage), no certification, no audit requirement, US origin. ISO/IEC 42001 = certifiable international standard requiring third-party audit, results in formal AIMS certification. If a question asks what makes an organization 'certified' in AI management, the answer is always ISO/IEC 42001.

EU AI ActNIST AI RMF

EU AI Act = legally binding EU regulation, risk-tier classification, mandatory conformity assessments for high-risk, financial penalties, applies to organizations deploying AI in the EU. NIST AI RMF = voluntary US framework, advisory only, no penalties. The key trigger: if the scenario mentions EU jurisdiction, EU-based operations, or 'mandatory compliance,' the answer is EU AI Act.

Data ProvenanceData Lineage

Provenance = WHERE data came from (source, consent basis, ownership, collection method). Lineage = HOW data was transformed (processing steps, aggregations, feature engineering applied). Audit scenario trigger: if the question is about ethical sourcing, consent verification, or data origin — provenance. If the question is about transformation history, reproducibility, or processing integrity — lineage.

AI Ethics BoardAI Center of Excellence

Ethics Board = advisory body that evaluates ethical dimensions of AI use cases and policies. Reports concerns but does not build or enforce. Center of Excellence = operational body that implements AI standards, best practices, and capability building across the organization. If the question asks who DEFINES standards and trains teams, it is the CoE. If it asks who evaluates whether an AI initiative is ethically acceptable, it is the Ethics Board.

ExplainabilityTransparency

Explainability = TECHNICAL — how the model makes individual decisions (SHAP, LIME, attention maps). Transparency = ORGANIZATIONAL — disclosing to stakeholders that AI is being used, what it does, its limitations, and how to contest decisions. An auditor must test for both separately: technical explainability testing AND communication transparency review.

Scenario Tips

If the question asks about:

When the question describes an organization deploying AI in Europe and asks which framework 'requires' or 'mandates' compliance...

Answer:

Choose EU AI Act. It is the only legally binding framework. Any question with words like 'penalties,' 'mandatory,' 'legally required,' or 'enforce' in the EU context points to the EU AI Act.

Distractor to avoid:

ISO/IEC 42001 is a wrong answer here — it is voluntary. NIST AI RMF is also wrong — it is US-origin voluntary guidance. OECD Principles are non-binding international guidelines.

If the question asks about:

When the question asks an auditor to evaluate whether an organization's AI system development team and approval function are properly structured...

Answer:

Look for separation of duties. The team that builds the model should NOT be the same team that approves it for production. If they are the same, the finding is a separation of duties control deficiency.

Distractor to avoid:

Candidates often select 'transparency' as the violated principle. Transparency concerns disclosure to external parties. Separation of duties is the internal control principle at stake.

If the question asks about:

When the question asks an auditor to verify the ethical and legal basis for an AI model's training data...

Answer:

Focus on data provenance controls: consent documentation, data source agreements, data classification, and purpose limitation. The auditor should verify that data was collected under consent that covers AI training use.

Distractor to avoid:

Data lineage is a wrong focus here — lineage tracks transformations, not the ethical collection basis. Candidates often confuse 'can we trace where it went?' (lineage) with 'was it collected properly?' (provenance).

If the question asks about:

When a question describes an employee using a third-party AI chatbot not approved by IT or the AI governance team...

Answer:

This is a shadow AI / unsanctioned AI use risk. The governance gap is the absence of an acceptable use policy covering third-party AI tools, or the absence of enforcement mechanisms for that policy.

Distractor to avoid:

Candidates sometimes choose 'data encryption' or 'access controls' as the primary issue. Those are valid secondary concerns, but the primary governance gap is the lack of AI usage policy and control over tool adoption.

Last-Minute Facts

1NIST AI RMF function order trap: Govern → Map → Measure → Manage. 'Govern' is the UMBRELLA that enables the other three — not just the first step. Questions that list functions out of order are checking whether you know Govern is the foundational layer.

2EU AI Act risk tiers: Unacceptable (banned), High (conformity assessment required), Limited (transparency required), Minimal (no requirements).

3ISO/IEC 42001: AIMS = AI Management System. It is a certifiable standard, auditable by third parties.

4Passing score: 450 out of 800 (scaled). Domain weights: AI Governance & Risk 33%, AI Operations 46%, AI Auditing Tools & Techniques 21%.

5Prerequisites: active CISA (all holders qualify) — or CIA, US CPA, Canadian CPA, Australian CPA/FCPA, Japanese CPA, ACCA, FCCA, ICAEW ACA/FCA, CA ANZ, or Hong Kong CPA/FCPA (with IT audit or advisory role focus). ISACA expanded eligibility in July 2025; do not assume only CISA qualifies.

6CPE maintenance: minimum 10 AI-domain CPE hours per year, 30 hours over each 3-year cycle. Annual maintenance fee: $20 (members) / $35 (non-members).

7EU AI Act high-risk examples tested: credit scoring, hiring decisions, safety-critical AI, biometric categorization, education assessment AI.

8OECD and UNESCO AI documents = non-binding international guidelines only.

Domain 246% of exam

AI Operations

Must-Know Facts

AI/ML lifecycle audit control points: (1) Business case — strategic alignment, (2) Data collection — consent and quality controls, (3) Feature engineering — bias introduction risk, (4) Model development — reproducibility, (5) Training/validation — independent validation team, (6) Deployment — change approval, (7) Monitoring — drift and bias rechecks, (8) Maintenance/retraining — change management, (9) Decommissioning — data retention and access revocation.
Three types of model drift: Data drift (input distributions change, model unchanged), Concept drift (relationship between inputs and outputs changes, real-world phenomenon evolves), Model drift (model performance degrades due to parameter decay or infrastructure changes). Each type requires different detection methods and remediation.
Cross-validation is a DEVELOPMENT technique (k-fold, holdout) used to estimate model performance during training. It is NOT a production monitoring control. Candidates who recommend 'cross-validation' to detect production drift are wrong.
Adversarial testing must occur BEFORE deployment as a proactive quality gate. It is not incident response. Testing covers prompt injection resistance, data poisoning detection, evasion attack tolerance, and adversarial example robustness.
Fairness testing methodologies: Demographic Parity (equal positive outcome rate across groups), Equal Opportunity (equal true positive rate), Disparate Impact (checks if a protected group receives negative outcomes at a rate 80%+ above others — the '4/5 rule'). The 4/5 rule / 80% rule is the legal threshold for disparate impact.
Model validation (pre-deployment) must be performed by a team INDEPENDENT of the model developers. Self-validation by the development team is a separation of duties deficiency.
Change management for AI requires more than code review: model retraining with new data changes model behavior even without code changes. AI change management must explicitly cover data updates, hyperparameter changes, retraining cycles, and vendor model updates.
AI incident response differs from traditional IT incident response: incidents include drift-triggered performance degradation, bias emergence, hallucination rate spikes, adversarial compromise, and vendor model changes. Root cause analysis must address AI-specific causes, not just system availability.
Hallucination monitoring is an AI-specific control with no traditional IT equivalent. Auditors must verify that: hallucination rate is measured, acceptable thresholds are defined, and alerts fire when thresholds are exceeded.
Human-in-the-loop (HITL) requirements: high-risk AI decisions (loan approvals, hiring, medical diagnosis support) require defined confidence score thresholds below which human review is mandatory. Auditors must verify HITL controls exist and operate effectively.
Vendor/third-party AI risks include: SLA non-compliance, API versioning changes that alter model behavior, vendor data retention practices, and vendor model updates that change downstream system behavior without notice. Right-to-audit clauses in vendor contracts are the primary control.
MLOps pipelines create audit trails through model registries, experiment tracking logs, pipeline run histories, and artifact versioning. Absence of these logs is a control deficiency in traceability.

Common Traps

TrapUsing 'cross-validation' to answer production monitoring or drift detection questions

RealityCross-validation is a model development technique applied during training. It estimates how well a model generalizes to unseen data during development. Production monitoring uses separate tools: statistical process control, population stability index (PSI), Kolmogorov-Smirnov tests for data drift, and performance metric dashboards. Never select cross-validation as a production control.

TrapTreating model validation and model monitoring as interchangeable

RealityValidation = pre-deployment, one-time gate activity performed before go-live to confirm the model meets requirements. Monitoring = post-deployment, continuous activity to track ongoing performance and detect drift. A model that passed validation will eventually degrade without monitoring. The exam tests whether BOTH controls exist, not just one.

TrapAssuming data drift and concept drift require the same remediation

RealityData drift (changing input distributions) may be remediated by retraining the model on more recent data. Concept drift (the underlying relationship between inputs and outputs has fundamentally changed) may require model redesign, not just retraining. Applying retraining-only remediation to concept drift is an inadequate control.

TrapBelieving adversarial testing is a reactive post-incident control

RealityAdversarial testing is a PROACTIVE pre-deployment quality gate. It must occur before the model is released to production. If the question describes an AI system already in production that has never been adversarially tested, the finding is a control gap in pre-deployment testing, not an incident response gap.

TrapThinking automatic retraining and deployment (without human approval) is an MLOps best practice

RealityFully automated retraining and deployment without human approval violates AI change management controls. Model updates — even automated ones — require an approval gate because retraining can subtly shift model behavior, fairness characteristics, and output distributions. Automated pipelines are acceptable for retraining, but deployment must be approved.

TrapConfusing data poisoning and prompt injection as the same type of attack

RealityData poisoning attacks the TRAINING phase — malicious data is injected into the training dataset before or during model training. Prompt injection attacks the INFERENCE phase — malicious inputs are crafted to manipulate model behavior at query time after deployment. Different attack surfaces require different controls: training data integrity controls vs. input validation and output filtering controls.

TrapAccepting stable test-set accuracy as evidence that a production model is performing correctly

RealityA model can maintain accuracy on a static test set while performing poorly in production if the production data distribution has shifted. The test set becomes stale. Auditors must verify that performance metrics are measured on current production data, not historical test benchmarks.

TrapTreating hallucination as a binary defect that either exists or does not exist

RealityHallucination is a rate-based metric. All generative AI systems produce some hallucinations. The audit control is whether the organization has defined an acceptable hallucination rate threshold, is actively measuring the rate, and has alert/escalation procedures when the rate exceeds the threshold.

Confusing Pairs

Data PoisoningPrompt Injection

Data Poisoning = TRAINING phase attack, malicious data embedded in training set, corrupts model learning, must be caught by training data integrity controls. Prompt Injection = INFERENCE phase attack, crafted inputs at query time, bypasses safety guardrails, must be caught by input validation and output filtering. Exam trigger: 'before deployment' = data poisoning; 'during use/at query time' = prompt injection.

Model ValidationModel Monitoring

Validation = one-time pre-deployment gate, independent team, confirms model meets design requirements, performance thresholds, and fairness criteria before go-live. Monitoring = continuous post-deployment tracking, automated dashboards, drift alerts, bias rechecks on schedule, triggers rollback or retraining when thresholds are breached. Both must exist — one does not substitute for the other.

Data DriftConcept Drift

Data Drift = input data distribution has changed (e.g., customer demographics shifted). The model is unchanged but now receives data unlike its training set. Detection: statistical tests on feature distributions. Concept Drift = the real-world relationship between inputs and outputs changed (e.g., economic conditions changed what predicts loan default). Detection: comparing prediction accuracy on labeled current data. Remediation: data drift may be fixed by retraining; concept drift may require model redesign.

SHAPLIME

SHAP = global and local explainability, assigns contribution scores to each feature based on game-theoretic Shapley values, consistent and theoretically grounded, computationally heavier. LIME = local explainability only, creates a simple interpretable model around a single prediction, faster but less consistent across similar inputs. Both are post-hoc explainability methods — they explain an already-trained model. For exam purposes, know both explain model decisions; SHAP is more globally applicable.

Fairness TestingAdversarial Testing

Fairness Testing = checks whether model outcomes are equitable across demographic groups — uses demographic parity, equal opportunity, disparate impact analysis. This is a compliance and ethics control. Adversarial Testing = probes model robustness against malicious inputs — prompt injection, evasion attacks, adversarial examples. This is a security control. They test for completely different failure modes and require different methodologies.

Feature StoreModel Registry

Feature Store = centralized repository for computed input features, ensures consistent feature definitions are shared across models, prevents feature definition drift between training and inference environments. Model Registry = centralized repository for trained model artifacts (weights, configurations, versions), tracks which model version is deployed in which environment, enables rollback. Audit evidence: feature store logs support training data integrity; model registry logs support change management.

Scenario Tips

If the question asks about:

When the question describes a model with stable test accuracy but worsening real-world performance, and input data distributions have not changed...

Answer:

Concept drift. The underlying relationship between inputs and outputs has changed in the real world. The model's learned patterns are obsolete even though inputs look statistically similar. The appropriate control is monitoring of outcome accuracy on labeled current production data, not statistical tests on input distributions.

Distractor to avoid:

Data drift is the most common wrong answer — it is tempting because drift was mentioned. But data drift requires changing input distributions, which the question explicitly rules out.

If the question asks about:

When the question asks what the auditor should do after finding that model retraining and deployment are fully automated with no human approval gate...

Answer:

Report an inadequate change management control. AI change management requires human approval before production deployment, even when retraining is automated. The finding should describe the risk: automated deployment can silently change model behavior, fairness characteristics, and decision patterns without oversight.

Distractor to avoid:

Candidates often select 'missing adversarial testing' because automated deployment sounds risky. But the specific control gap described is the absence of an approval workflow — change management, not testing.

If the question asks about:

When the question asks which control verifies that an AI hiring model does not disadvantage a protected demographic group...

Answer:

Disparate impact analysis (or fairness testing using demographic parity/equal opportunity). These specifically measure outcome differences across demographic groups. For hiring contexts, the 4/5 rule (disparate impact threshold of 80%) is the standard legal benchmark.

Distractor to avoid:

Adversarial testing is a common wrong answer. Adversarial testing checks robustness against malicious inputs — it does not measure demographic fairness in outcomes. Cross-validation is also wrong — it measures accuracy, not fairness.

If the question asks about:

When a question asks the most important audit procedure for an AI system that relies on a third-party API for its core inference capability...

Answer:

Verify that right-to-audit clauses and SLA compliance monitoring are in place. Without right-to-audit provisions, the organization cannot independently assess vendor controls. SLA monitoring ensures service quality and availability obligations are being met.

Distractor to avoid:

Checking that the API documentation is publicly available sounds reasonable but is not a control. API redundancy (multiple vendors) is a continuity measure, not an audit control. The correct answer focuses on contractual audit rights and performance monitoring.

If the question asks about:

When a question describes an AI system where the model's training logs were deleted after the most recent retraining cycle...

Answer:

This is a control deficiency in audit trail and evidence preservation. Training logs are essential for reproducibility — without them, an auditor cannot independently verify how the model was trained, what data was used, or whether the approved training procedure was followed. The finding category is audit trail management.

Distractor to avoid:

Candidates sometimes classify this as a storage optimization issue or acceptable trade-off. This is never acceptable in an auditable AI system — training logs must be retained for the entire model's operational life plus the required retention period.

If the question asks about:

When a question asks what control ensures that features used during model training and during production inference are identical...

Answer:

A feature store with shared feature definitions. Training-serving skew (different feature computation during training vs. inference) is a major source of silent model performance degradation. The feature store ensures both training pipelines and inference pipelines consume identically defined features.

Distractor to avoid:

Model monitoring catches the symptom (performance degradation) but does not prevent training-serving skew from occurring. The preventive control is the feature store.

Last-Minute Facts

1Domain 2 is 46% of the exam — roughly 41 out of 90 questions. This is where traditional IT auditors lose the most points.

2The 4/5 rule (80% rule): disparate impact is indicated when the selection rate for a protected group is less than 80% of the rate for the most-favored group.

3Three drift types: Data drift (input distributions change), Concept drift (input-output relationship changes), Model drift (performance degrades over time from other causes).

4Adversarial attacks: Data poisoning (training phase), Prompt injection (inference phase), Model inversion (reconstruct training data from outputs), Model extraction/theft (replicate model via queries), Adversarial examples (crafted inputs cause misclassification).

5MLOps pipeline artifacts that constitute audit evidence: experiment tracking logs, model registry entries, pipeline run logs, data versioning records, feature store change logs.

6Human-in-the-loop (HITL): mandatory for high-risk AI decisions; triggered by low confidence scores below a defined threshold.

7Training-serving skew: when feature computation differs between training and inference — feature stores prevent this.

8Fully automated model retraining WITHOUT human deployment approval = change management deficiency, always.

9Hallucination rate must have a defined acceptable threshold — absence of a threshold definition is itself a control gap.

10SHAP and LIME are post-hoc explainability tools; they interpret already-trained models, not training processes.

Domain 321% of exam

AI Auditing Tools and Techniques

Must-Know Facts

AI audit scope definition must address: system boundary (what AI components are in scope), data flows (training data, inference data), third-party dependencies, regulatory jurisdiction, and risk-based prioritization of which AI controls to test.
Control testing methodologies for AI systems: (1) Walkthrough — trace a transaction or decision through the AI lifecycle, (2) Configuration review — validate AI system settings against approved configurations, (3) Output sampling — inspect a sample of model decisions for accuracy and fairness, (4) Reperformance — independently run the model with test inputs and compare outputs, (5) Fairness analysis — apply disparate impact or demographic parity analysis to model outputs.
AI audit evidence must meet four standards: Sufficiency (enough evidence to support the finding), Reliability (from a trustworthy source, preferably system-generated logs), Relevance (directly related to the control being tested), Reproducibility (can an independent auditor recreate the same finding). Reproducibility is particularly challenging for stochastic AI systems.
When using AI tools within the audit process (Domain 3 focus): AI-generated findings must be treated as inputs to audit judgment, not final conclusions. Professional skepticism must be maintained. Over-reliance on AI-flagged anomalies without human evaluation is an independence and quality risk.
Risk-based sampling is preferred over random sampling for AI audits because AI systems have non-uniform risk distributions — some model decisions or time periods carry materially higher risk and should be over-sampled.
Full-population testing enabled by AI analytics: auditors can test 100% of transactions using automated tools. This does not eliminate the need for professional judgment on flagged items — anomaly detection identifies candidates for further investigation, not confirmed findings.
AI audit reporting must communicate findings in terms that both technical and non-technical stakeholders can understand. Findings should include: risk rating of the AI control gap, business impact description (what decisions were affected), and specific remediation recommendation.
Auditor independence when using AI tools: the auditor must not rely on analytics tools provided by the system under audit. Using the auditee's own AI monitoring dashboards as sole evidence creates an independence concern — additional independent evidence must be obtained.
Workpaper documentation for AI audits must include: the AI tool or system version tested, the specific inputs used for testing, expected vs. actual outputs, fairness test methodology and results, and limitations of the evidence (e.g., stochastic behavior noted).

Common Traps

TrapThinking that using AI in the audit process is the same as auditing an AI system

RealityThese are opposite directions. Domain 2 covers auditing AI systems (the AI is the subject of the audit). Domain 3 covers using AI as an audit tool (AI assists the auditor in performing the audit). The exam distinguishes these sharply. An answer about 'using AI analytics for full-population testing' is a Domain 3 answer. An answer about 'testing bias controls in an AI hiring model' is a Domain 2 answer.

TrapConcluding that AI-generated full-population testing results are findings that can be directly reported

RealityAI-generated anomaly flags are inputs to audit investigation, not confirmed findings. Each flagged item requires professional judgment to determine whether it represents a control deficiency, a false positive, or an acceptable exception. Reporting AI flags directly without evaluation is a quality and independence failure.

TrapTreating the auditee's AI monitoring dashboard as sufficient independent audit evidence

RealityThe auditee's own monitoring tools are management-produced evidence. They lack independence. An auditor who relies solely on the auditee's dashboards without obtaining independent corroborating evidence has not maintained auditor independence. Additional evidence — such as independently running test cases or obtaining third-party validation reports — is required.

TrapAssuming that evidence from AI systems is always reliable because it is system-generated

RealitySystem-generated AI evidence can still be unreliable if the AI system itself has integrity issues (compromised training logs, altered model registry entries, tampered monitoring data). The auditor must assess the integrity of the logging and evidence system before relying on it.

TrapSelecting random sampling as the preferred approach for AI audit sampling

RealityRisk-based sampling is preferred for AI audits. AI models have non-uniform output distributions — high-impact or high-uncertainty decisions (near decision boundaries) carry disproportionate risk. Random sampling may systematically miss these high-risk segments. The examiner expects the auditor to apply risk judgment in sample design.

Confusing Pairs

Auditing AI Systems (Domain 2)Using AI in Auditing (Domain 3)

Auditing AI Systems = the AI is the audit subject; the auditor is evaluating AI governance, controls, bias, drift, and lifecycle management of an AI solution. Using AI in Auditing = AI is the audit tool; the auditor uses AI analytics to improve efficiency, coverage, or detection. If a question says 'the auditor is reviewing the AI model's performance' — Domain 2. If it says 'the auditor is using an AI tool to analyze 100% of transactions' — Domain 3.

Risk-Based SamplingFull-Population Testing

Risk-Based Sampling = selects items with the highest risk for testing, smaller sample, deeper analysis, appropriate when AI system has identifiable high-risk outputs (edge cases, low-confidence decisions, minority group outcomes). Full-Population Testing = tests all transactions using AI analytics, broader coverage, appropriate when anomaly patterns are diffuse and cannot be predicted. Full-population testing still requires manual evaluation of flagged items.

Sufficient EvidenceReproducible Evidence

Sufficient = there is ENOUGH evidence to support the conclusion (quantity and breadth of testing). Reproducible = another auditor with the same inputs would reach the same conclusion (consistency). Both must be present. AI system evidence fails reproducibility when stochastic model outputs differ across runs — auditors must document when this limitation applies and account for it in the evidence assessment.

Control Design AdequacyControl Operating Effectiveness

Design Adequacy = the control, if operating as designed, would prevent or detect the risk. This is assessed by reviewing control documentation and walkthroughs. Operating Effectiveness = the control is actually functioning as designed in practice over a period of time. This requires testing actual evidence of control operation. For AI audits: a model monitoring policy documents design; reviewing actual drift alerts and responses tests operating effectiveness.

Scenario Tips

If the question asks about:

When a question asks an auditor who is using AI analytics tools to review financial transactions about the most significant risk to maintain...

Answer:

Independence and professional skepticism — specifically avoiding over-reliance on AI-generated findings. The auditor must evaluate AI-flagged anomalies with professional judgment, not accept them as confirmed deficiencies.

Distractor to avoid:

Cost and time savings are true benefits of AI analytics but are not risks. The risk is over-reliance, not efficiency.

If the question asks about:

When a question asks how an auditor should test whether an AI model is making equitable decisions across demographic groups...

Answer:

Analyze model outputs stratified by demographic groups and apply fairness metrics (demographic parity, equal opportunity, disparate impact analysis). This directly tests the outcome — whether decisions differ by group — rather than testing inputs or intentions.

Distractor to avoid:

Reviewing source code is a wrong answer — bias emerges from data patterns, not typically from explicit code logic. Interviewing the data science team tests intent, not outcomes. Only outcome analysis directly tests for discriminatory impact.

If the question asks about:

When a question describes an auditor discovering that training logs for an AI model were deleted, and asks how to classify this finding...

Answer:

This is a control deficiency in audit trail and evidence management — specifically, failure to maintain records necessary for reproducibility and independent verification. The severity is significant because it prevents the auditor from verifying training procedure compliance.

Distractor to avoid:

Classifying this as 'a minor documentation issue' is wrong. It eliminates the evidentiary basis for assessing training process controls. Never minimize the deletion of audit trail records.

If the question asks about:

When a question asks an auditor using AI tools to check the output of the AI monitoring dashboard provided by the auditee...

Answer:

The dashboard is management evidence and lacks independence. The auditor must obtain additional independent corroborating evidence — running independent test cases, obtaining third-party attestation, or reviewing unmodifiable system logs — rather than relying solely on the auditee's own monitoring outputs.

Distractor to avoid:

Accepting the dashboard as sufficient evidence is the trap answer. It appears practical, but it violates auditor independence principles.

Last-Minute Facts

1Four evidence quality standards: Sufficiency, Reliability, Relevance, Reproducibility. Reproducibility is the most AI-specific challenge.

2Stochastic AI outputs: when a model produces different outputs for the same inputs across runs (due to randomness), this must be documented as a limitation in audit workpapers.

3Full-population testing with AI analytics does not replace professional judgment — AI flags are investigation leads, not conclusions.

4Control design vs. operating effectiveness: walkthroughs test design; evidence of operation over time tests effectiveness.

5Auditor independence when using AI tools: never rely solely on the auditee's own AI tools as your only evidence source.

6Risk-based sampling = preferred default for AI audits. Random sampling misses the non-uniform risk distribution of AI model outputs.

7Audit reporting to non-technical stakeholders: translate AI findings into business impact terms — describe which decisions were affected and what the business consequence is.

8AI audit workpapers must document: model version tested, inputs used, expected vs. actual outputs, fairness methodology applied, and any stochastic limitations noted.

Feeling confident?

Put your knowledge to the test with a timed AAIA mock exam.