What is Unity Catalog?
Unity Catalog is the unified governance solution for all data and AI assets in Databricks. It provides: - A three-level namespace: catalog.schema.table - Centralized access control across all workspaces - Data lineage tracking - Data discovery and search All data access in Databricks goes through Unity Catalog.
Managed vs External Tables
Managed tables: Databricks manages both the metadata AND the data files. Data is stored in the catalog's managed storage location. External tables: Databricks manages only the metadata. Data files are stored in a customer-managed cloud storage location. Critical difference: Dropping a managed table deletes the data. Dropping an external table only removes the metadata — the data files remain.
-- Managed table (default)
CREATE TABLE catalog.schema.managed_table (
id INT, name STRING
);
-- External table
CREATE TABLE catalog.schema.external_table (
id INT, name STRING
)
LOCATION 's3://my-bucket/external/data';
-- Convert external to managed
ALTER TABLE catalog.schema.external_table
SET TBLPROPERTIES ('delta.columnMapping.mode' = 'name');Access Controls
Unity Catalog uses SQL-based access controls: - GRANT: Give privileges to users, groups, or service principals - REVOKE: Remove privileges - DENY: Explicitly deny access (overrides GRANT) Common privileges: SELECT, MODIFY, CREATE TABLE, USAGE, ALL PRIVILEGES Hierarchy: To access a table, you need USAGE on the catalog, USAGE on the schema, and the specific privilege (e.g., SELECT) on the table.
-- Grant schema access
GRANT USAGE ON SCHEMA catalog.schema TO `data_analysts`;
-- Grant table read access
GRANT SELECT ON TABLE catalog.schema.my_table TO `data_analysts`;
-- Grant all privileges
GRANT ALL PRIVILEGES ON TABLE catalog.schema.my_table TO `data_engineers`;
-- Revoke access
REVOKE SELECT ON TABLE catalog.schema.my_table FROM `intern_group`;Row-Level Security and Column Masking
Unity Catalog supports fine-grained security: - Row-level security: Filter rows based on the querying user's identity or group membership - Column masking: Replace sensitive column values with masked data for unauthorized users - ABAC policies: Attribute-based access control for centralized row and column filtering These features allow different users to see different subsets of the same table without creating separate views.