Quick Navigation
SQL DDL — Structure CommandsSQL DML — Data ManipulationSQL TCL — Transaction ControlNormalizationDatabase Types — Relational vs NoSQLDatabase Programming ConstructsDatabase Deployment — Architecture and SchemaDatabase Deployment — Design and TestingDatabase Management — Monitoring and PerformanceDatabase Management — Change Management and MaintenanceData and Database Security — Encryption and Data ProtectionData and Database Security — Access ControlsData and Database Security — Attacks and PreventionCompliance and GovernanceBusiness Continuity — Backup StrategiesBusiness Continuity — High Availability and DRBusiness Continuity — DR Documentation and Physical Security
SQL DDL — Structure Commands
- CREATE TABLE employees (id INT PRIMARY KEY, name VARCHAR(100), dept_id INT);
- Creates a new table with defined columns, data types, and a primary key constraint.
- ALTER TABLE employees ADD COLUMN salary DECIMAL(10,2);
- Adds a new column to an existing table without dropping or recreating it.
- DROP TABLE employees;
- Permanently removes a table and all its data from the database — cannot be rolled back in most engines.
- TRUNCATE TABLE employees;
- Removes all rows from a table and resets auto-increment counters — DDL (not DML), so it cannot be rolled back in most databases.
- CREATE INDEX idx_dept ON employees(dept_id);
- Creates a non-clustered index on dept_id to speed up queries that filter or join on that column.
- CREATE VIEW active_employees AS SELECT * FROM employees WHERE status = 'active';
- Creates a virtual table based on a query — does not store data, always reflects current state of the underlying table.
- TRUNCATE is DDL; DELETE is DML
- TRUNCATE resets table structure (no WHERE clause, no row-by-row logging). DELETE removes specific rows and can be rolled back — this is a frequent exam trap.
SQL DML — Data Manipulation
- SELECT e.name, d.dept_name FROM employees e INNER JOIN departments d ON e.dept_id = d.id WHERE e.salary > 50000 ORDER BY e.name;
- Retrieves employees earning over $50,000 joined to their department, sorted by name — combines INNER JOIN, WHERE, and ORDER BY.
- SELECT dept_id, COUNT(*) AS headcount, AVG(salary) AS avg_sal FROM employees GROUP BY dept_id HAVING COUNT(*) > 5;
- Aggregates employees by department, filters groups with more than 5 members using HAVING (not WHERE, which filters rows before grouping).
- INSERT INTO employees (name, dept_id, salary) VALUES ('Alice', 3, 75000);
- Inserts a single row into the employees table with explicit column names and values.
- UPDATE employees SET salary = salary * 1.10 WHERE dept_id = 3;
- Applies a 10% salary increase to all employees in department 3 — always use WHERE to avoid updating every row.
- DELETE FROM employees WHERE status = 'inactive';
- Removes rows matching the WHERE condition — can be rolled back within a transaction, unlike TRUNCATE.
- SELECT * FROM employees WHERE dept_id IN (SELECT id FROM departments WHERE region = 'West');
- Subquery example — inner SELECT returns department IDs, outer SELECT retrieves matching employees.
- LEFT JOIN vs INNER JOIN
- INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table plus matched rows from the right (NULLs for non-matches).
- RIGHT JOIN / FULL OUTER JOIN
- RIGHT JOIN returns all rows from the right table plus matched left rows. FULL OUTER JOIN returns all rows from both tables with NULLs where there is no match — less common but tested.
SQL TCL — Transaction Control
- BEGIN; UPDATE accounts SET balance = balance - 500 WHERE id = 1; UPDATE accounts SET balance = balance + 500 WHERE id = 2; COMMIT;
- Wraps a fund transfer in a transaction — COMMIT makes both updates permanent only when both succeed (Atomicity).
- ROLLBACK;
- Undoes all changes made since the last COMMIT or BEGIN, restoring the database to its prior state.
- SAVEPOINT sp1; -- work -- ROLLBACK TO sp1;
- Creates a named restore point within a transaction, allowing partial rollback without undoing the entire transaction.
- ACID: Atomicity
- All operations in a transaction succeed or all are rolled back — no partial commits.
- ACID: Consistency
- A transaction transitions the database from one valid state to another, never violating defined rules and constraints.
- ACID: Isolation
- Concurrent transactions execute independently — intermediate states are not visible to other transactions.
- ACID: Durability
- Once a transaction is committed, it persists even if the system crashes immediately afterward.
Normalization
- 1NF: Atomic values, no repeating groups
- Each column holds a single, indivisible value — no comma-separated lists or arrays in a single cell.
- 2NF: 1NF + no partial dependencies
- Every non-key attribute must depend on the entire composite primary key, not just part of it.
- 3NF: 2NF + no transitive dependencies
- Non-key attributes must depend only on the primary key, not on other non-key attributes.
- BCNF: Every determinant is a candidate key
- A stricter form of 3NF where any attribute that functionally determines another must itself be a candidate key.
- Normalization vs Denormalization trade-off
- Normalization reduces redundancy and update anomalies but requires more JOINs (slower reads). Denormalization improves read performance but increases redundancy.
- 1NF violation: phone_numbers = '555-1111, 555-2222'
- Storing multiple values in one column violates 1NF — the fix is to create a separate phone_numbers table with one row per number.
Database Types — Relational vs NoSQL
- Relational (RDBMS)
- Stores data in tables with rows and columns, enforces schemas and ACID properties, uses SQL. Best for transactional systems requiring data integrity — examples: PostgreSQL, MySQL, SQL Server.
- Document database
- Stores data as JSON/BSON documents with flexible schemas. Best for content management, catalogs, and user profiles — examples: MongoDB, Cosmos DB.
- Key-Value store
- Stores data as simple key-value pairs for ultra-fast lookups. Best for caching, session data, and leaderboards — examples: Redis, DynamoDB.
- Column-oriented (Wide-column) database
- Stores data in columns grouped into column families — optimized for large-scale distributed writes and analytics — examples: Apache Cassandra, HBase.
- Graph database
- Stores nodes and edges representing entities and relationships — optimized for traversing deeply connected data like social networks or fraud rings — examples: Neo4j.
- NoSQL does NOT mean No SQL
- Many NoSQL databases support SQL-like query languages. The distinction is about flexible data models and schema design, not query language — a frequent exam misconception.
Database Programming Constructs
- CREATE PROCEDURE get_employees(@dept_id INT) AS BEGIN SELECT * FROM employees WHERE dept_id = @dept_id; END;
- Stored procedure: pre-compiled, reusable SQL code block explicitly called by an application or user — not automatic.
- CREATE TRIGGER audit_update AFTER UPDATE ON employees FOR EACH ROW INSERT INTO audit_log VALUES (OLD.id, NOW());
- Trigger: SQL code that fires automatically on a data event (INSERT/UPDATE/DELETE) — cannot be explicitly called like a stored procedure.
- CREATE FUNCTION calc_tax(salary DECIMAL) RETURNS DECIMAL AS BEGIN RETURN salary * 0.20; END;
- Function: returns a calculated scalar value or result set — can be called inside SELECT statements unlike stored procedures.
- Stored Procedure vs Trigger
- Stored procedures must be explicitly invoked. Triggers fire automatically on data events — if the question says 'automatically when data changes,' the answer is a trigger.
- View vs Materialized View
- A standard view executes its query at runtime (always current, no storage). A materialized view caches results physically and must be refreshed to reflect changes.
- ORM tools (Hibernate, Entity Framework, Ebean)
- Object-Relational Mapping tools translate between application objects and database tables, abstracting SQL — they can generate inefficient queries that require DBA review.
- python3 -c "import psycopg2; conn = psycopg2.connect(dsn); cur = conn.cursor(); cur.execute('SELECT ...')"
- Python client-side scripting for database automation — runs on the client, not inside the database engine (unlike stored procedures which run server-side).
- Invoke-Sqlcmd -ServerInstance 'srv' -Database 'db' -Query 'SELECT ...'
- PowerShell cmdlet for SQL Server automation — used for scheduled maintenance scripts, backups, and administrative tasks from the command line.
Database Deployment — Architecture and Schema
- IaaS (e.g., PostgreSQL on EC2)
- You manage the database, OS, patching, and backups — maximum control but maximum operational responsibility.
- PaaS (e.g., AWS RDS, Azure SQL Database)
- Vendor manages OS, patching, and backups — you control schema, queries, and configuration. Most common for managed database deployments.
- SaaS (e.g., fully managed database service)
- Vendor manages everything including the database layer — minimal customization but zero infrastructure management.
- Logical schema
- Describes WHAT data is stored — entities, attributes, and relationships expressed at the business level independent of physical implementation.
- Physical schema
- Describes HOW data is physically stored — actual tables, columns, indexes, partitions, and storage structures.
- Data dictionary
- Centralized metadata repository documenting table definitions, column descriptions, data types, constraints, and relationships — used for governance and documentation.
- Stress testing vs Regression testing
- Stress testing pushes the system to capacity limits. Regression testing ensures new changes did not break existing functionality — these are distinct testing types.
- Negative testing
- Intentionally submitting invalid inputs (nulls, wrong data types, boundary violations) to verify the database handles errors gracefully.
Database Deployment — Design and Testing
- ERD (Entity Relationship Diagram)
- Visually maps entities, their attributes, and relationships with cardinality notation (one-to-one, one-to-many, many-to-many) — used for database design and documentation.
- Cardinality: 1:1, 1:N, M:N
- 1:1 = one entity maps to exactly one other. 1:N = one entity maps to many. M:N = many entities map to many — M:N requires a junction (bridge) table to implement in a relational database.
- Referential integrity
- Ensures every foreign key value matches an existing primary key — prevents orphan records. Enforced via FOREIGN KEY constraints with ON DELETE CASCADE or ON DELETE RESTRICT.
- FOREIGN KEY (dept_id) REFERENCES departments(id) ON DELETE CASCADE;
- Foreign key constraint that automatically deletes child rows when the parent row is deleted — referential integrity enforcement built into the schema definition.
- Connectivity: database port defaults
- MySQL: 3306. PostgreSQL: 5432. SQL Server: 1433. Oracle: 1521 — firewall rules must permit these ports between application and database tiers.
- Quality check vs regression test vs stress test
- Quality check verifies data accuracy. Regression test confirms changes did not break existing functions. Stress test pushes the system to capacity limits — each serves a distinct purpose in deployment testing.
Database Management — Monitoring and Performance
- EXPLAIN SELECT * FROM employees WHERE dept_id = 3;
- Shows the query execution plan — use this to identify full table scans, missing indexes, and expensive operations before optimizing.
- REBUILD INDEX
- Rebuilds fragmented indexes to restore performance — schedule during maintenance windows because it temporarily degrades write performance.
- Clustered index
- Determines the physical sort order of rows in the table — only ONE per table. Best for range queries (BETWEEN, ORDER BY) on the indexed column.
- Non-clustered index
- A separate B-tree structure with pointers back to table rows — a table can have MANY non-clustered indexes. Best for columns frequently in WHERE and JOIN clauses.
- Too many indexes slow writes
- Every INSERT, UPDATE, and DELETE must update all indexes on the table — index every column that needs indexing, but no more.
- Baseline configuration
- Establish normal performance metrics (CPU, memory, disk I/O, throughput) first — deviations from baseline, not absolute values, indicate problems.
- Monitoring targets
- Track CPU utilization, memory usage, disk I/O, query throughput, concurrent connections, replication lag, backup success/failure, and failed login attempts.
- Composite index: CREATE INDEX idx_dept_sal ON employees(dept_id, salary);
- A composite index covers multiple columns — most effective when queries filter on the leading column first. Column order matters for performance.
Database Management — Change Management and Maintenance
- Change management workflow
- Document proposed change → Test in non-production → Get approval → Schedule maintenance window → Apply change → Verify → Document results → Maintain rollback plan.
- Patch management rule
- Always test security patches in a non-production environment first, even critical patches — applying untested patches directly to production is a change management violation.
- Materialized view refresh
- Run REFRESH MATERIALIZED VIEW report_summary to update cached results — schedule during off-peak hours so reports do not compete with transactional workloads.
- Data dictionary maintenance
- Keep data dictionaries current when schema changes occur — they document metadata that supports governance, auditing, and onboarding.
- Audit log review
- Periodically review transaction and system logs to identify performance issues, failed jobs, unauthorized access, and operational anomalies — not only a security task.
- Load balancing
- Distributes database workload across multiple servers to improve performance and prevent any single server from becoming a bottleneck.
Data and Database Security — Encryption and Data Protection
- Encryption at rest
- Encrypts data stored on disk using AES-256 or similar — protects against physical theft of storage media and unauthorized access to files.
- Encryption in transit (TLS/SSL)
- Encrypts data moving between client and database server — mitigates on-path (man-in-the-middle) interception attacks.
- Data masking
- Replaces sensitive values with realistic but fictional data while preserving format (a masked SSN still looks like an SSN) — used for non-production/development environments.
- Data masking vs Encryption
- Masking preserves format and is used for non-production data. Encryption makes data unreadable and is used for production data protection — do not confuse them.
- Data discovery
- Identifies where sensitive data (PII, PHI, payment card data) exists across databases and systems — required before you can classify and protect it.
- Secure data destruction
- Cryptographic erasure, degaussing, or physical destruction of storage media when decommissioning — required for compliance and preventing data recovery.
Data and Database Security — Access Controls
- GRANT SELECT, INSERT ON employees TO app_user;
- Grants specific DML permissions to a database user — always use least privilege by granting only what is required for the role.
- REVOKE INSERT ON employees FROM app_user;
- Removes a previously granted permission from a user or role.
- CREATE ROLE read_only; GRANT SELECT ON ALL TABLES TO read_only;
- Creates a role and grants permissions to it — assign users to roles rather than granting permissions directly to individual users.
- Least privilege principle
- Grant only the minimum permissions needed for a task — service accounts and application users should never have DBA-level access.
- Authentication vs Authorization
- Authentication verifies identity (who are you?). Authorization determines what an authenticated identity is permitted to do — authentication happens first.
- Service accounts
- Dedicated credentials for automated processes — never use personal or shared human accounts for service connections. Rotate passwords regularly.
- Role-Based Access Control (RBAC)
- Assign permissions to roles, then assign roles to users — easier to manage than per-user grants and enforces consistent access policies.
Data and Database Security — Attacks and Prevention
- SQL Injection example: ' OR 1=1 --
- Injecting SQL into an input field makes the WHERE clause always TRUE, returning all rows — the most heavily tested attack in Domain 4.
- Parameterized query (prevents SQL injection)
- Use prepared statements with bind parameters instead of string concatenation — the database treats user input as data, never as SQL code.
- SELECT * FROM users WHERE id = ? -- bind parameter
- Placeholder ? binds user-supplied value as a literal, preventing injection regardless of what the user enters.
- Denial of Service (DoS)
- Overwhelming database resources with excessive requests — mitigated by connection limits, query timeouts, and rate limiting.
- On-path (man-in-the-middle) attack
- Attacker intercepts network traffic between client and database — mitigated by encryption in transit (TLS), not encryption at rest.
- Brute-force attack
- Systematically trying many passwords to gain database access — mitigated by account lockout policies, strong passwords, and multi-factor authentication.
- Infrastructure: DMZ placement
- Databases should never be in the DMZ — place them on the internal network behind firewalls, with only the application tier in the DMZ.
Compliance and Governance
- GDPR
- Protects EU personal data — applies regardless of where the organization or servers are located. If you process EU citizen data, GDPR applies to you.
- PCI DSS
- Secures payment card data — applies to any organization that processes, stores, or transmits cardholder information.
- GDPR vs PCI DSS scope
- GDPR covers all personal data of EU subjects. PCI DSS covers only payment card data. Both can apply simultaneously to the same system.
- PII (Personally Identifiable Information)
- Data that can identify a specific individual — names, SSNs, addresses, email. Regulated by GDPR and other privacy laws.
- PHI (Protected Health Information)
- Health data linked to an individual — regulated by HIPAA in the US. Must be encrypted and access-controlled.
- Data classification
- Categorizing data by sensitivity (public, internal, confidential, restricted) to determine appropriate handling, access controls, and encryption requirements.
- Data Loss Prevention (DLP)
- Tools and policies that detect and prevent unauthorized transfer of sensitive data out of the organization — enforces data classification policies.
Business Continuity — Backup Strategies
- Full backup
- Copies ALL data every run — simplest restore (single backup set needed) but slowest to run and uses most storage.
- Incremental backup
- Copies only data changed since the LAST BACKUP OF ANY TYPE — fastest to run, uses least storage, but slowest to restore (need full + every incremental in sequence).
- Differential backup
- Copies all data changed since the LAST FULL BACKUP — restore needs only last full + last differential (faster than incremental restore).
- Incremental vs Differential
- Incremental: changes since last backup of any type. Differential: changes since last full backup only — this distinction is the most tested in Domain 5.
- Restore order for incrementals: Sunday full + Mon + Tue + Wed
- With incremental backups, restore the full backup first, then apply each incremental in chronological order — every incremental is needed.
- Backup validation
- Always test restores and verify backup integrity using hash verification — backups are worthless if they cannot be restored.
- Off-site vs on-site backups
- On-site backups enable faster restores. Off-site backups protect against site-level disasters. A complete strategy uses both.
Business Continuity — High Availability and DR
- RPO (Recovery Point Objective)
- Maximum acceptable DATA LOSS measured in time — an RPO of 1 hour means you can tolerate losing up to 1 hour of transactions.
- RTO (Recovery Time Objective)
- Maximum acceptable DOWNTIME — an RTO of 4 hours means the database must be back online within 4 hours of a failure.
- RPO drives backup frequency; RTO drives DR architecture
- Near-zero RPO requires real-time replication. Near-zero RTO requires hot standby with automatic failover — both are measured in time but measure different things.
- Synchronous replication
- Transaction is confirmed on both primary and standby before returning success — zero data loss (RPO=0) but adds latency to every write.
- Asynchronous replication
- Transactions commit on the primary before being sent to the standby — lower latency but risks some data loss if primary fails before replication completes.
- Log shipping
- Periodically sends transaction log backups to a standby server on a schedule — has built-in delay, typically requires manual failover, data loss equals shipping interval.
- Database mirroring
- Maintains a synchronized copy on a mirror server — can be synchronous (zero data loss) with automatic failover when a witness server is configured.
- Failover vs Failback
- Failover switches operations to the standby when the primary fails. Failback returns operations to the primary after it is repaired — both must be planned and tested.
Business Continuity — DR Documentation and Physical Security
- Disaster Recovery (DR) Manual
- Step-by-step procedures for recovering the database after a failure event — must be tested, updated after every schema or infrastructure change, and stored off-site.
- Continuity of Operations Plan (COOP)
- Documents how essential database services continue during and after a disruption — broader than a DR manual, covering staffing, communication, and alternate sites.
- Build documentation
- Records the exact steps, configurations, and scripts needed to rebuild the database environment from scratch — critical when the primary and standby are both lost.
- Physical security controls
- Badge access, biometric locks, surveillance cameras, fire suppression, and environmental controls (cooling, humidity) protect the physical infrastructure hosting the database — a tested Domain 4 topic.
- Data retention policy
- Defines how long backup copies and database records are retained before deletion or archival — driven by regulatory requirements (GDPR, PCI DSS) and business needs.