CompTIADS0-001117 concepts

DS0-001 Cheat Sheet

Quick reference for the CompTIA DataSys+ exam.

Quick Navigation

SQL DDL — Structure Commands SQL DML — Data Manipulation SQL TCL — Transaction Control Normalization Database Types — Relational vs NoSQL Database Programming Constructs Database Deployment — Architecture and Schema Database Deployment — Design and Testing Database Management — Monitoring and Performance Database Management — Change Management and Maintenance Data and Database Security — Encryption and Data Protection Data and Database Security — Access Controls Data and Database Security — Attacks and Prevention Compliance and Governance Business Continuity — Backup Strategies Business Continuity — High Availability and DR Business Continuity — DR Documentation and Physical Security

SQL DDL — Structure Commands

CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100), dept_id INT);: Creates a new table with defined columns, data types, and a primary key constraint.
ALTER TABLE employees ADD COLUMN salary DECIMAL(10,2);: Adds a new column to an existing table without dropping or recreating it.
DROP TABLE employees;: Permanently removes a table and all its data from the database — cannot be rolled back in most engines.
TRUNCATE TABLE employees;: Removes all rows from a table and resets auto-increment counters — DDL (not DML), so it cannot be rolled back in most databases.
CREATE INDEX idx_dept ON employees(dept_id);: Creates a non-clustered index on dept_id to speed up queries that filter or join on that column.
CREATE VIEW active_employees AS SELECT * FROM employees WHERE status = 'active';: Creates a virtual table based on a query — does not store data, always reflects current state of the underlying table.
TRUNCATE is DDL; DELETE is DML: TRUNCATE resets table structure (no WHERE clause, no row-by-row logging). DELETE removes specific rows and can be rolled back — this is a frequent exam trap.

SQL DML — Data Manipulation

SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.id WHERE e.salary > 50000 ORDER BY e.name;: Retrieves employees earning over $50,000 joined to their department, sorted by name — combines INNER JOIN, WHERE, and ORDER BY.
SELECT dept_id, COUNT(*) AS headcount, AVG(salary) AS avg_sal FROM employees GROUP BY dept_id HAVING COUNT(*) > 5;: Aggregates employees by department, filters groups with more than 5 members using HAVING (not WHERE, which filters rows before grouping).
INSERT INTO employees (name, dept_id, salary) VALUES ('Alice', 3, 75000);: Inserts a single row into the employees table with explicit column names and values.
UPDATE employees SET salary = salary * 1.10 WHERE dept_id = 3;: Applies a 10% salary increase to all employees in department 3 — always use WHERE to avoid updating every row.
DELETE FROM employees WHERE status = 'inactive';: Removes rows matching the WHERE condition — can be rolled back within a transaction, unlike TRUNCATE.
SELECT * FROM employees WHERE dept_id IN (SELECT id FROM departments WHERE region = 'West');: Subquery example — inner SELECT returns department IDs, outer SELECT retrieves matching employees.
LEFT JOIN vs INNER JOIN: INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table plus matched rows from the right (NULLs for non-matches).
RIGHT JOIN / FULL OUTER JOIN: RIGHT JOIN returns all rows from the right table plus matched left rows. FULL OUTER JOIN returns all rows from both tables with NULLs where there is no match — less common but tested.

SQL TCL — Transaction Control

BEGIN; UPDATE accounts SET balance = balance - 500 WHERE id = 1; UPDATE accounts SET balance = balance + 500 WHERE id = 2; COMMIT;: Wraps a fund transfer in a transaction — COMMIT makes both updates permanent only when both succeed (Atomicity).
ROLLBACK;: Undoes all changes made since the last COMMIT or BEGIN, restoring the database to its prior state.
SAVEPOINT sp1; -- work -- ROLLBACK TO sp1;: Creates a named restore point within a transaction, allowing partial rollback without undoing the entire transaction.
ACID: Atomicity: All operations in a transaction succeed or all are rolled back — no partial commits.
ACID: Consistency: A transaction transitions the database from one valid state to another, never violating defined rules and constraints.
ACID: Isolation: Concurrent transactions execute independently — intermediate states are not visible to other transactions.
ACID: Durability: Once a transaction is committed, it persists even if the system crashes immediately afterward.

Normalization

1NF: Atomic values, no repeating groups: Each column holds a single, indivisible value — no comma-separated lists or arrays in a single cell.
2NF: 1NF + no partial dependencies: Every non-key attribute must depend on the entire composite primary key, not just part of it.
3NF: 2NF + no transitive dependencies: Non-key attributes must depend only on the primary key, not on other non-key attributes.
BCNF: Every determinant is a candidate key: A stricter form of 3NF where any attribute that functionally determines another must itself be a candidate key.
Normalization vs Denormalization trade-off: Normalization reduces redundancy and update anomalies but requires more JOINs (slower reads). Denormalization improves read performance but increases redundancy.
1NF violation: phone_numbers = '555-1111, 555-2222': Storing multiple values in one column violates 1NF — the fix is to create a separate phone_numbers table with one row per number.

Database Types — Relational vs NoSQL

Relational (RDBMS): Stores data in tables with rows and columns, enforces schemas and ACID properties, uses SQL. Best for transactional systems requiring data integrity — examples: PostgreSQL, MySQL, SQL Server.
Document database: Stores data as JSON/BSON documents with flexible schemas. Best for content management, catalogs, and user profiles — examples: MongoDB, Cosmos DB.
Key-Value store: Stores data as simple key-value pairs for ultra-fast lookups. Best for caching, session data, and leaderboards — examples: Redis, DynamoDB.
Column-oriented (Wide-column) database: Stores data in columns grouped into column families — optimized for large-scale distributed writes and analytics — examples: Apache Cassandra, HBase.
Graph database: Stores nodes and edges representing entities and relationships — optimized for traversing deeply connected data like social networks or fraud rings — examples: Neo4j.
NoSQL does NOT mean No SQL: Many NoSQL databases support SQL-like query languages. The distinction is about flexible data models and schema design, not query language — a frequent exam misconception.

Database Programming Constructs

CREATE PROCEDURE get_employees(@dept_id INT) AS BEGIN SELECT * FROM employees WHERE dept_id = @dept_id; END;: Stored procedure: pre-compiled, reusable SQL code block explicitly called by an application or user — not automatic.
CREATE TRIGGER audit_update AFTER UPDATE ON employees FOR EACH ROW INSERT INTO audit_log VALUES (OLD.id, NOW());: Trigger: SQL code that fires automatically on a data event (INSERT/UPDATE/DELETE) — cannot be explicitly called like a stored procedure.
CREATE FUNCTION calc_tax(salary DECIMAL) RETURNS DECIMAL AS BEGIN RETURN salary * 0.20; END;: Function: returns a calculated scalar value or result set — can be called inside SELECT statements unlike stored procedures.
Stored Procedure vs Trigger: Stored procedures must be explicitly invoked. Triggers fire automatically on data events — if the question says 'automatically when data changes,' the answer is a trigger.
View vs Materialized View: A standard view executes its query at runtime (always current, no storage). A materialized view caches results physically and must be refreshed to reflect changes.
ORM tools (Hibernate, Entity Framework, Ebean): Object-Relational Mapping tools translate between application objects and database tables, abstracting SQL — they can generate inefficient queries that require DBA review.
python3 -c "import psycopg2; conn = psycopg2.connect(dsn); cur = conn.cursor(); cur.execute('SELECT ...')": Python client-side scripting for database automation — runs on the client, not inside the database engine (unlike stored procedures which run server-side).
Invoke-Sqlcmd -ServerInstance 'srv' -Database 'db' -Query 'SELECT ...': PowerShell cmdlet for SQL Server automation — used for scheduled maintenance scripts, backups, and administrative tasks from the command line.

Database Deployment — Architecture and Schema

IaaS (e.g., PostgreSQL on EC2): You manage the database, OS, patching, and backups — maximum control but maximum operational responsibility.
PaaS (e.g., AWS RDS, Azure SQL Database): Vendor manages OS, patching, and backups — you control schema, queries, and configuration. Most common for managed database deployments.
SaaS (e.g., fully managed database service): Vendor manages everything including the database layer — minimal customization but zero infrastructure management.
Logical schema: Describes WHAT data is stored — entities, attributes, and relationships expressed at the business level independent of physical implementation.
Physical schema: Describes HOW data is physically stored — actual tables, columns, indexes, partitions, and storage structures.
Data dictionary: Centralized metadata repository documenting table definitions, column descriptions, data types, constraints, and relationships — used for governance and documentation.
Stress testing vs Regression testing: Stress testing pushes the system to capacity limits. Regression testing ensures new changes did not break existing functionality — these are distinct testing types.
Negative testing: Intentionally submitting invalid inputs (nulls, wrong data types, boundary violations) to verify the database handles errors gracefully.

Database Deployment — Design and Testing

ERD (Entity Relationship Diagram): Visually maps entities, their attributes, and relationships with cardinality notation (one-to-one, one-to-many, many-to-many) — used for database design and documentation.
Cardinality: 1:1, 1:N, M:N: 1:1 = one entity maps to exactly one other. 1:N = one entity maps to many. M:N = many entities map to many — M:N requires a junction (bridge) table to implement in a relational database.
Referential integrity: Ensures every foreign key value matches an existing primary key — prevents orphan records. Enforced via FOREIGN KEY constraints with ON DELETE CASCADE or ON DELETE RESTRICT.
FOREIGN KEY (dept_id) REFERENCES departments(id) ON DELETE CASCADE;: Foreign key constraint that automatically deletes child rows when the parent row is deleted — referential integrity enforcement built into the schema definition.
Connectivity: database port defaults: MySQL: 3306. PostgreSQL: 5432. SQL Server: 1433. Oracle: 1521 — firewall rules must permit these ports between application and database tiers.
Quality check vs regression test vs stress test: Quality check verifies data accuracy. Regression test confirms changes did not break existing functions. Stress test pushes the system to capacity limits — each serves a distinct purpose in deployment testing.

Database Management — Monitoring and Performance

EXPLAIN SELECT * FROM employees WHERE dept_id = 3;: Shows the query execution plan — use this to identify full table scans, missing indexes, and expensive operations before optimizing.
REBUILD INDEX: Rebuilds fragmented indexes to restore performance — schedule during maintenance windows because it temporarily degrades write performance.
Clustered index: Determines the physical sort order of rows in the table — only ONE per table. Best for range queries (BETWEEN, ORDER BY) on the indexed column.
Non-clustered index: A separate B-tree structure with pointers back to table rows — a table can have MANY non-clustered indexes. Best for columns frequently in WHERE and JOIN clauses.
Too many indexes slow writes: Every INSERT, UPDATE, and DELETE must update all indexes on the table — index every column that needs indexing, but no more.
Baseline configuration: Establish normal performance metrics (CPU, memory, disk I/O, throughput) first — deviations from baseline, not absolute values, indicate problems.
Monitoring targets: Track CPU utilization, memory usage, disk I/O, query throughput, concurrent connections, replication lag, backup success/failure, and failed login attempts.
Composite index: CREATE INDEX idx_dept_sal ON employees(dept_id, salary);: A composite index covers multiple columns — most effective when queries filter on the leading column first. Column order matters for performance.

Database Management — Change Management and Maintenance

Change management workflow: Document proposed change → Test in non-production → Get approval → Schedule maintenance window → Apply change → Verify → Document results → Maintain rollback plan.
Patch management rule: Always test security patches in a non-production environment first, even critical patches — applying untested patches directly to production is a change management violation.
Materialized view refresh: Run REFRESH MATERIALIZED VIEW report_summary to update cached results — schedule during off-peak hours so reports do not compete with transactional workloads.
Data dictionary maintenance: Keep data dictionaries current when schema changes occur — they document metadata that supports governance, auditing, and onboarding.
Audit log review: Periodically review transaction and system logs to identify performance issues, failed jobs, unauthorized access, and operational anomalies — not only a security task.
Load balancing: Distributes database workload across multiple servers to improve performance and prevent any single server from becoming a bottleneck.

Data and Database Security — Encryption and Data Protection

Encryption at rest: Encrypts data stored on disk using AES-256 or similar — protects against physical theft of storage media and unauthorized access to files.
Encryption in transit (TLS/SSL): Encrypts data moving between client and database server — mitigates on-path (man-in-the-middle) interception attacks.
Data masking: Replaces sensitive values with realistic but fictional data while preserving format (a masked SSN still looks like an SSN) — used for non-production/development environments.
Data masking vs Encryption: Masking preserves format and is used for non-production data. Encryption makes data unreadable and is used for production data protection — do not confuse them.
Data discovery: Identifies where sensitive data (PII, PHI, payment card data) exists across databases and systems — required before you can classify and protect it.
Secure data destruction: Cryptographic erasure, degaussing, or physical destruction of storage media when decommissioning — required for compliance and preventing data recovery.

Data and Database Security — Access Controls

GRANT SELECT, INSERT ON employees TO app_user;: Grants specific DML permissions to a database user — always use least privilege by granting only what is required for the role.
REVOKE INSERT ON employees FROM app_user;: Removes a previously granted permission from a user or role.
CREATE ROLE read_only; GRANT SELECT ON ALL TABLES TO read_only;: Creates a role and grants permissions to it — assign users to roles rather than granting permissions directly to individual users.
Least privilege principle: Grant only the minimum permissions needed for a task — service accounts and application users should never have DBA-level access.
Authentication vs Authorization: Authentication verifies identity (who are you?). Authorization determines what an authenticated identity is permitted to do — authentication happens first.
Service accounts: Dedicated credentials for automated processes — never use personal or shared human accounts for service connections. Rotate passwords regularly.
Role-Based Access Control (RBAC): Assign permissions to roles, then assign roles to users — easier to manage than per-user grants and enforces consistent access policies.

Data and Database Security — Attacks and Prevention

SQL Injection example: ' OR 1=1 --: Injecting SQL into an input field makes the WHERE clause always TRUE, returning all rows — the most heavily tested attack in Domain 4.
Parameterized query (prevents SQL injection): Use prepared statements with bind parameters instead of string concatenation — the database treats user input as data, never as SQL code.
SELECT * FROM users WHERE id = ? -- bind parameter: Placeholder ? binds user-supplied value as a literal, preventing injection regardless of what the user enters.
Denial of Service (DoS): Overwhelming database resources with excessive requests — mitigated by connection limits, query timeouts, and rate limiting.
On-path (man-in-the-middle) attack: Attacker intercepts network traffic between client and database — mitigated by encryption in transit (TLS), not encryption at rest.
Brute-force attack: Systematically trying many passwords to gain database access — mitigated by account lockout policies, strong passwords, and multi-factor authentication.
Infrastructure: DMZ placement: Databases should never be in the DMZ — place them on the internal network behind firewalls, with only the application tier in the DMZ.

Compliance and Governance

GDPR: Protects EU personal data — applies regardless of where the organization or servers are located. If you process EU citizen data, GDPR applies to you.
PCI DSS: Secures payment card data — applies to any organization that processes, stores, or transmits cardholder information.
GDPR vs PCI DSS scope: GDPR covers all personal data of EU subjects. PCI DSS covers only payment card data. Both can apply simultaneously to the same system.
PII (Personally Identifiable Information): Data that can identify a specific individual — names, SSNs, addresses, email. Regulated by GDPR and other privacy laws.
PHI (Protected Health Information): Health data linked to an individual — regulated by HIPAA in the US. Must be encrypted and access-controlled.
Data classification: Categorizing data by sensitivity (public, internal, confidential, restricted) to determine appropriate handling, access controls, and encryption requirements.
Data Loss Prevention (DLP): Tools and policies that detect and prevent unauthorized transfer of sensitive data out of the organization — enforces data classification policies.

Business Continuity — Backup Strategies

Full backup: Copies ALL data every run — simplest restore (single backup set needed) but slowest to run and uses most storage.
Incremental backup: Copies only data changed since the LAST BACKUP OF ANY TYPE — fastest to run, uses least storage, but slowest to restore (need full + every incremental in sequence).
Differential backup: Copies all data changed since the LAST FULL BACKUP — restore needs only last full + last differential (faster than incremental restore).
Incremental vs Differential: Incremental: changes since last backup of any type. Differential: changes since last full backup only — this distinction is the most tested in Domain 5.
Restore order for incrementals: Sunday full + Mon + Tue + Wed: With incremental backups, restore the full backup first, then apply each incremental in chronological order — every incremental is needed.
Backup validation: Always test restores and verify backup integrity using hash verification — backups are worthless if they cannot be restored.
Off-site vs on-site backups: On-site backups enable faster restores. Off-site backups protect against site-level disasters. A complete strategy uses both.

Business Continuity — High Availability and DR

RPO (Recovery Point Objective): Maximum acceptable DATA LOSS measured in time — an RPO of 1 hour means you can tolerate losing up to 1 hour of transactions.
RTO (Recovery Time Objective): Maximum acceptable DOWNTIME — an RTO of 4 hours means the database must be back online within 4 hours of a failure.
RPO drives backup frequency; RTO drives DR architecture: Near-zero RPO requires real-time replication. Near-zero RTO requires hot standby with automatic failover — both are measured in time but measure different things.
Synchronous replication: Transaction is confirmed on both primary and standby before returning success — zero data loss (RPO=0) but adds latency to every write.
Asynchronous replication: Transactions commit on the primary before being sent to the standby — lower latency but risks some data loss if primary fails before replication completes.
Log shipping: Periodically sends transaction log backups to a standby server on a schedule — has built-in delay, typically requires manual failover, data loss equals shipping interval.
Database mirroring: Maintains a synchronized copy on a mirror server — can be synchronous (zero data loss) with automatic failover when a witness server is configured.
Failover vs Failback: Failover switches operations to the standby when the primary fails. Failback returns operations to the primary after it is repaired — both must be planned and tested.

Business Continuity — DR Documentation and Physical Security

Disaster Recovery (DR) Manual: Step-by-step procedures for recovering the database after a failure event — must be tested, updated after every schema or infrastructure change, and stored off-site.
Continuity of Operations Plan (COOP): Documents how essential database services continue during and after a disruption — broader than a DR manual, covering staffing, communication, and alternate sites.
Build documentation: Records the exact steps, configurations, and scripts needed to rebuild the database environment from scratch — critical when the primary and standby are both lost.
Physical security controls: Badge access, biometric locks, surveillance cameras, fire suppression, and environmental controls (cooling, humidity) protect the physical infrastructure hosting the database — a tested Domain 4 topic.
Data retention policy: Defines how long backup copies and database records are retained before deletion or archival — driven by regulatory requirements (GDPR, PCI DSS) and business needs.

Ready to test yourself?

Start a timed DS0-001 mock exam or review practice questions by domain.