CertPrepNow
CompTIADS0-001117 concepts

DS0-001 Cheat Sheet

Quick reference for the CompTIA DataSys+ exam.

SQL DDL — Structure Commands

CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100), dept_id INT);
Creates a new table with defined columns, data types, and a primary key constraint.
ALTER TABLE employees ADD COLUMN salary DECIMAL(10,2);
Adds a new column to an existing table without dropping or recreating it.
DROP TABLE employees;
Permanently removes a table and all its data from the database — cannot be rolled back in most engines.
TRUNCATE TABLE employees;
Removes all rows from a table and resets auto-increment counters — DDL (not DML), so it cannot be rolled back in most databases.
CREATE INDEX idx_dept ON employees(dept_id);
Creates a non-clustered index on dept_id to speed up queries that filter or join on that column.
CREATE VIEW active_employees AS SELECT * FROM employees WHERE status = 'active';
Creates a virtual table based on a query — does not store data, always reflects current state of the underlying table.
TRUNCATE is DDL; DELETE is DML
TRUNCATE resets table structure (no WHERE clause, no row-by-row logging). DELETE removes specific rows and can be rolled back — this is a frequent exam trap.

SQL DML — Data Manipulation

SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.id WHERE e.salary > 50000 ORDER BY e.name;
Retrieves employees earning over $50,000 joined to their department, sorted by name — combines INNER JOIN, WHERE, and ORDER BY.
SELECT dept_id, COUNT(*) AS headcount, AVG(salary) AS avg_sal FROM employees GROUP BY dept_id HAVING COUNT(*) > 5;
Aggregates employees by department, filters groups with more than 5 members using HAVING (not WHERE, which filters rows before grouping).
INSERT INTO employees (name, dept_id, salary) VALUES ('Alice', 3, 75000);
Inserts a single row into the employees table with explicit column names and values.
UPDATE employees SET salary = salary * 1.10 WHERE dept_id = 3;
Applies a 10% salary increase to all employees in department 3 — always use WHERE to avoid updating every row.
DELETE FROM employees WHERE status = 'inactive';
Removes rows matching the WHERE condition — can be rolled back within a transaction, unlike TRUNCATE.
SELECT * FROM employees WHERE dept_id IN (SELECT id FROM departments WHERE region = 'West');
Subquery example — inner SELECT returns department IDs, outer SELECT retrieves matching employees.
LEFT JOIN vs INNER JOIN
INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table plus matched rows from the right (NULLs for non-matches).
RIGHT JOIN / FULL OUTER JOIN
RIGHT JOIN returns all rows from the right table plus matched left rows. FULL OUTER JOIN returns all rows from both tables with NULLs where there is no match — less common but tested.

SQL TCL — Transaction Control

BEGIN; UPDATE accounts SET balance = balance - 500 WHERE id = 1; UPDATE accounts SET balance = balance + 500 WHERE id = 2; COMMIT;
Wraps a fund transfer in a transaction — COMMIT makes both updates permanent only when both succeed (Atomicity).
ROLLBACK;
Undoes all changes made since the last COMMIT or BEGIN, restoring the database to its prior state.
SAVEPOINT sp1; -- work -- ROLLBACK TO sp1;
Creates a named restore point within a transaction, allowing partial rollback without undoing the entire transaction.
ACID: Atomicity
All operations in a transaction succeed or all are rolled back — no partial commits.
ACID: Consistency
A transaction transitions the database from one valid state to another, never violating defined rules and constraints.
ACID: Isolation
Concurrent transactions execute independently — intermediate states are not visible to other transactions.
ACID: Durability
Once a transaction is committed, it persists even if the system crashes immediately afterward.

Normalization

1NF: Atomic values, no repeating groups
Each column holds a single, indivisible value — no comma-separated lists or arrays in a single cell.
2NF: 1NF + no partial dependencies
Every non-key attribute must depend on the entire composite primary key, not just part of it.
3NF: 2NF + no transitive dependencies
Non-key attributes must depend only on the primary key, not on other non-key attributes.
BCNF: Every determinant is a candidate key
A stricter form of 3NF where any attribute that functionally determines another must itself be a candidate key.
Normalization vs Denormalization trade-off
Normalization reduces redundancy and update anomalies but requires more JOINs (slower reads). Denormalization improves read performance but increases redundancy.
1NF violation: phone_numbers = '555-1111, 555-2222'
Storing multiple values in one column violates 1NF — the fix is to create a separate phone_numbers table with one row per number.

Database Types — Relational vs NoSQL

Relational (RDBMS)
Stores data in tables with rows and columns, enforces schemas and ACID properties, uses SQL. Best for transactional systems requiring data integrity — examples: PostgreSQL, MySQL, SQL Server.
Document database
Stores data as JSON/BSON documents with flexible schemas. Best for content management, catalogs, and user profiles — examples: MongoDB, Cosmos DB.
Key-Value store
Stores data as simple key-value pairs for ultra-fast lookups. Best for caching, session data, and leaderboards — examples: Redis, DynamoDB.
Column-oriented (Wide-column) database
Stores data in columns grouped into column families — optimized for large-scale distributed writes and analytics — examples: Apache Cassandra, HBase.
Graph database
Stores nodes and edges representing entities and relationships — optimized for traversing deeply connected data like social networks or fraud rings — examples: Neo4j.
NoSQL does NOT mean No SQL
Many NoSQL databases support SQL-like query languages. The distinction is about flexible data models and schema design, not query language — a frequent exam misconception.

Database Programming Constructs

CREATE PROCEDURE get_employees(@dept_id INT) AS BEGIN SELECT * FROM employees WHERE dept_id = @dept_id; END;
Stored procedure: pre-compiled, reusable SQL code block explicitly called by an application or user — not automatic.
CREATE TRIGGER audit_update AFTER UPDATE ON employees FOR EACH ROW INSERT INTO audit_log VALUES (OLD.id, NOW());
Trigger: SQL code that fires automatically on a data event (INSERT/UPDATE/DELETE) — cannot be explicitly called like a stored procedure.
CREATE FUNCTION calc_tax(salary DECIMAL) RETURNS DECIMAL AS BEGIN RETURN salary * 0.20; END;
Function: returns a calculated scalar value or result set — can be called inside SELECT statements unlike stored procedures.
Stored Procedure vs Trigger
Stored procedures must be explicitly invoked. Triggers fire automatically on data events — if the question says 'automatically when data changes,' the answer is a trigger.
View vs Materialized View
A standard view executes its query at runtime (always current, no storage). A materialized view caches results physically and must be refreshed to reflect changes.
ORM tools (Hibernate, Entity Framework, Ebean)
Object-Relational Mapping tools translate between application objects and database tables, abstracting SQL — they can generate inefficient queries that require DBA review.
python3 -c "import psycopg2; conn = psycopg2.connect(dsn); cur = conn.cursor(); cur.execute('SELECT ...')"
Python client-side scripting for database automation — runs on the client, not inside the database engine (unlike stored procedures which run server-side).
Invoke-Sqlcmd -ServerInstance 'srv' -Database 'db' -Query 'SELECT ...'
PowerShell cmdlet for SQL Server automation — used for scheduled maintenance scripts, backups, and administrative tasks from the command line.

Database Deployment — Architecture and Schema

IaaS (e.g., PostgreSQL on EC2)
You manage the database, OS, patching, and backups — maximum control but maximum operational responsibility.
PaaS (e.g., AWS RDS, Azure SQL Database)
Vendor manages OS, patching, and backups — you control schema, queries, and configuration. Most common for managed database deployments.
SaaS (e.g., fully managed database service)
Vendor manages everything including the database layer — minimal customization but zero infrastructure management.
Logical schema
Describes WHAT data is stored — entities, attributes, and relationships expressed at the business level independent of physical implementation.
Physical schema
Describes HOW data is physically stored — actual tables, columns, indexes, partitions, and storage structures.
Data dictionary
Centralized metadata repository documenting table definitions, column descriptions, data types, constraints, and relationships — used for governance and documentation.
Stress testing vs Regression testing
Stress testing pushes the system to capacity limits. Regression testing ensures new changes did not break existing functionality — these are distinct testing types.
Negative testing
Intentionally submitting invalid inputs (nulls, wrong data types, boundary violations) to verify the database handles errors gracefully.

Database Deployment — Design and Testing

ERD (Entity Relationship Diagram)
Visually maps entities, their attributes, and relationships with cardinality notation (one-to-one, one-to-many, many-to-many) — used for database design and documentation.
Cardinality: 1:1, 1:N, M:N
1:1 = one entity maps to exactly one other. 1:N = one entity maps to many. M:N = many entities map to many — M:N requires a junction (bridge) table to implement in a relational database.
Referential integrity
Ensures every foreign key value matches an existing primary key — prevents orphan records. Enforced via FOREIGN KEY constraints with ON DELETE CASCADE or ON DELETE RESTRICT.
FOREIGN KEY (dept_id) REFERENCES departments(id) ON DELETE CASCADE;
Foreign key constraint that automatically deletes child rows when the parent row is deleted — referential integrity enforcement built into the schema definition.
Connectivity: database port defaults
MySQL: 3306. PostgreSQL: 5432. SQL Server: 1433. Oracle: 1521 — firewall rules must permit these ports between application and database tiers.
Quality check vs regression test vs stress test
Quality check verifies data accuracy. Regression test confirms changes did not break existing functions. Stress test pushes the system to capacity limits — each serves a distinct purpose in deployment testing.

Database Management — Monitoring and Performance

EXPLAIN SELECT * FROM employees WHERE dept_id = 3;
Shows the query execution plan — use this to identify full table scans, missing indexes, and expensive operations before optimizing.
REBUILD INDEX
Rebuilds fragmented indexes to restore performance — schedule during maintenance windows because it temporarily degrades write performance.
Clustered index
Determines the physical sort order of rows in the table — only ONE per table. Best for range queries (BETWEEN, ORDER BY) on the indexed column.
Non-clustered index
A separate B-tree structure with pointers back to table rows — a table can have MANY non-clustered indexes. Best for columns frequently in WHERE and JOIN clauses.
Too many indexes slow writes
Every INSERT, UPDATE, and DELETE must update all indexes on the table — index every column that needs indexing, but no more.
Baseline configuration
Establish normal performance metrics (CPU, memory, disk I/O, throughput) first — deviations from baseline, not absolute values, indicate problems.
Monitoring targets
Track CPU utilization, memory usage, disk I/O, query throughput, concurrent connections, replication lag, backup success/failure, and failed login attempts.
Composite index: CREATE INDEX idx_dept_sal ON employees(dept_id, salary);
A composite index covers multiple columns — most effective when queries filter on the leading column first. Column order matters for performance.

Database Management — Change Management and Maintenance

Change management workflow
Document proposed change → Test in non-production → Get approval → Schedule maintenance window → Apply change → Verify → Document results → Maintain rollback plan.
Patch management rule
Always test security patches in a non-production environment first, even critical patches — applying untested patches directly to production is a change management violation.
Materialized view refresh
Run REFRESH MATERIALIZED VIEW report_summary to update cached results — schedule during off-peak hours so reports do not compete with transactional workloads.
Data dictionary maintenance
Keep data dictionaries current when schema changes occur — they document metadata that supports governance, auditing, and onboarding.
Audit log review
Periodically review transaction and system logs to identify performance issues, failed jobs, unauthorized access, and operational anomalies — not only a security task.
Load balancing
Distributes database workload across multiple servers to improve performance and prevent any single server from becoming a bottleneck.

Data and Database Security — Encryption and Data Protection

Encryption at rest
Encrypts data stored on disk using AES-256 or similar — protects against physical theft of storage media and unauthorized access to files.
Encryption in transit (TLS/SSL)
Encrypts data moving between client and database server — mitigates on-path (man-in-the-middle) interception attacks.
Data masking
Replaces sensitive values with realistic but fictional data while preserving format (a masked SSN still looks like an SSN) — used for non-production/development environments.
Data masking vs Encryption
Masking preserves format and is used for non-production data. Encryption makes data unreadable and is used for production data protection — do not confuse them.
Data discovery
Identifies where sensitive data (PII, PHI, payment card data) exists across databases and systems — required before you can classify and protect it.
Secure data destruction
Cryptographic erasure, degaussing, or physical destruction of storage media when decommissioning — required for compliance and preventing data recovery.

Data and Database Security — Access Controls

GRANT SELECT, INSERT ON employees TO app_user;
Grants specific DML permissions to a database user — always use least privilege by granting only what is required for the role.
REVOKE INSERT ON employees FROM app_user;
Removes a previously granted permission from a user or role.
CREATE ROLE read_only; GRANT SELECT ON ALL TABLES TO read_only;
Creates a role and grants permissions to it — assign users to roles rather than granting permissions directly to individual users.
Least privilege principle
Grant only the minimum permissions needed for a task — service accounts and application users should never have DBA-level access.
Authentication vs Authorization
Authentication verifies identity (who are you?). Authorization determines what an authenticated identity is permitted to do — authentication happens first.
Service accounts
Dedicated credentials for automated processes — never use personal or shared human accounts for service connections. Rotate passwords regularly.
Role-Based Access Control (RBAC)
Assign permissions to roles, then assign roles to users — easier to manage than per-user grants and enforces consistent access policies.

Data and Database Security — Attacks and Prevention

SQL Injection example: ' OR 1=1 --
Injecting SQL into an input field makes the WHERE clause always TRUE, returning all rows — the most heavily tested attack in Domain 4.
Parameterized query (prevents SQL injection)
Use prepared statements with bind parameters instead of string concatenation — the database treats user input as data, never as SQL code.
SELECT * FROM users WHERE id = ? -- bind parameter
Placeholder ? binds user-supplied value as a literal, preventing injection regardless of what the user enters.
Denial of Service (DoS)
Overwhelming database resources with excessive requests — mitigated by connection limits, query timeouts, and rate limiting.
On-path (man-in-the-middle) attack
Attacker intercepts network traffic between client and database — mitigated by encryption in transit (TLS), not encryption at rest.
Brute-force attack
Systematically trying many passwords to gain database access — mitigated by account lockout policies, strong passwords, and multi-factor authentication.
Infrastructure: DMZ placement
Databases should never be in the DMZ — place them on the internal network behind firewalls, with only the application tier in the DMZ.

Compliance and Governance

GDPR
Protects EU personal data — applies regardless of where the organization or servers are located. If you process EU citizen data, GDPR applies to you.
PCI DSS
Secures payment card data — applies to any organization that processes, stores, or transmits cardholder information.
GDPR vs PCI DSS scope
GDPR covers all personal data of EU subjects. PCI DSS covers only payment card data. Both can apply simultaneously to the same system.
PII (Personally Identifiable Information)
Data that can identify a specific individual — names, SSNs, addresses, email. Regulated by GDPR and other privacy laws.
PHI (Protected Health Information)
Health data linked to an individual — regulated by HIPAA in the US. Must be encrypted and access-controlled.
Data classification
Categorizing data by sensitivity (public, internal, confidential, restricted) to determine appropriate handling, access controls, and encryption requirements.
Data Loss Prevention (DLP)
Tools and policies that detect and prevent unauthorized transfer of sensitive data out of the organization — enforces data classification policies.

Business Continuity — Backup Strategies

Full backup
Copies ALL data every run — simplest restore (single backup set needed) but slowest to run and uses most storage.
Incremental backup
Copies only data changed since the LAST BACKUP OF ANY TYPE — fastest to run, uses least storage, but slowest to restore (need full + every incremental in sequence).
Differential backup
Copies all data changed since the LAST FULL BACKUP — restore needs only last full + last differential (faster than incremental restore).
Incremental vs Differential
Incremental: changes since last backup of any type. Differential: changes since last full backup only — this distinction is the most tested in Domain 5.
Restore order for incrementals: Sunday full + Mon + Tue + Wed
With incremental backups, restore the full backup first, then apply each incremental in chronological order — every incremental is needed.
Backup validation
Always test restores and verify backup integrity using hash verification — backups are worthless if they cannot be restored.
Off-site vs on-site backups
On-site backups enable faster restores. Off-site backups protect against site-level disasters. A complete strategy uses both.

Business Continuity — High Availability and DR

RPO (Recovery Point Objective)
Maximum acceptable DATA LOSS measured in time — an RPO of 1 hour means you can tolerate losing up to 1 hour of transactions.
RTO (Recovery Time Objective)
Maximum acceptable DOWNTIME — an RTO of 4 hours means the database must be back online within 4 hours of a failure.
RPO drives backup frequency; RTO drives DR architecture
Near-zero RPO requires real-time replication. Near-zero RTO requires hot standby with automatic failover — both are measured in time but measure different things.
Synchronous replication
Transaction is confirmed on both primary and standby before returning success — zero data loss (RPO=0) but adds latency to every write.
Asynchronous replication
Transactions commit on the primary before being sent to the standby — lower latency but risks some data loss if primary fails before replication completes.
Log shipping
Periodically sends transaction log backups to a standby server on a schedule — has built-in delay, typically requires manual failover, data loss equals shipping interval.
Database mirroring
Maintains a synchronized copy on a mirror server — can be synchronous (zero data loss) with automatic failover when a witness server is configured.
Failover vs Failback
Failover switches operations to the standby when the primary fails. Failback returns operations to the primary after it is repaired — both must be planned and tested.

Business Continuity — DR Documentation and Physical Security

Disaster Recovery (DR) Manual
Step-by-step procedures for recovering the database after a failure event — must be tested, updated after every schema or infrastructure change, and stored off-site.
Continuity of Operations Plan (COOP)
Documents how essential database services continue during and after a disruption — broader than a DR manual, covering staffing, communication, and alternate sites.
Build documentation
Records the exact steps, configurations, and scripts needed to rebuild the database environment from scratch — critical when the primary and standby are both lost.
Physical security controls
Badge access, biometric locks, surveillance cameras, fire suppression, and environmental controls (cooling, humidity) protect the physical infrastructure hosting the database — a tested Domain 4 topic.
Data retention policy
Defines how long backup copies and database records are retained before deletion or archival — driven by regulatory requirements (GDPR, PCI DSS) and business needs.

Ready to test yourself?

Start a timed DS0-001 mock exam or review practice questions by domain.