Apache Polaris is an open-source catalog for Apache Iceberg that provides metadata management and access control across multiple engines. Its security model is built around two layers: namespace isolation that separates tenants at the catalog level, and access policies that enforce fine-grained permissions on tables and views. This architecture allows organizations to run a single Polaris instance for multiple departments while maintaining strict data boundaries. Understanding how these layers interact is critical for any team deploying Polaris in production.
The Catalog-First Design: Why Polaris Owns the Security Boundary
Traditional metastores like Hive or AWS Glue store metadata but delegate security to external systems. Hive uses Sentry or Ranger for authorization, and Glue relies on IAM policies. Polaris takes a different approach: it owns the catalog, the metadata, and the access policies in a single system. This eliminates the synchronization problem where a table exists in the catalog but the access policy has not been updated, or vice versa.
The catalog in Polaris is the root security boundary. Every table, view, and namespace exists within a catalog, and every access decision is evaluated against the catalog's policy set. When a query engine like Spark or Trino connects to Polaris, it authenticates at the catalog level first. Only after catalog authentication succeeds does Polaris evaluate the specific table-level permissions.
Why This Matters for Multi-Tenant Deployments
In a multi-tenant setup, each department runs its own catalog within a shared Polaris instance. The catalogs are isolated at the namespace level - a table in the marketing catalog cannot be accessed from a query that targets the finance catalog unless the principal has explicit cross-catalog permissions. This is fundamentally different from a single shared metastore where all tables exist in one namespace and security is enforced by external rules.
The catalog-first model also simplifies disaster recovery. Because each catalog is a self-contained unit of metadata and policy, an entire catalog can be exported, backed up, or restored independently. If the marketing catalog needs to be rolled back to a previous state, the operation does not affect finance or engineering.
Authentication: How Polaris Verifies Identity
Polaris supports multiple authentication mechanisms, and the choice of mechanism affects how principals are represented in the access policy system. The supported methods are:
- OAuth 2.0 token-based authentication - Clients present a Bearer token obtained from an external identity provider. Polaris validates the token signature and extracts the principal identity from the JWT claims.
- Service principal authentication - Clients authenticate using a client ID and secret registered in Polaris. This is the standard approach for automated workflows and service-to-service communication.
- Trusted authentication - For deployments where Polaris sits behind a trusted proxy (e.g., a Kubernetes ingress controller or an API gateway), the proxy can forward authenticated identity via headers. Polaris trusts the header value and skips its own token validation.
The authentication mechanism determines the principal format used in access policies. For OAuth tokens, the principal is typically a user ID or email address from the identity provider. For service principals, it is the registered client ID. For trusted authentication, it is whatever the proxy sends in the header.
Token Validation and Claims Extraction
When using OAuth 2.0, Polaris validates the token by checking its signature against the identity provider's public key. The JWKS endpoint is configured at catalog creation time, and Polaris caches the keys with a TTL to avoid repeated requests. The extracted claims are mapped to a Polaris principal using a configurable claim mapping. By default, Polaris uses the sub claim, but this can be changed to email, preferred_username, or a custom claim.
The token must also pass scope validation. Polaris expects certain scopes to be present depending on the operation. For read-only operations, catalog:read is sufficient. For table creation or modification, catalog:write is required. For administrative operations like catalog creation or policy changes, catalog:admin is required. If the token lacks the required scope, the request is rejected before reaching the access policy evaluation layer.
Authorization: Role-Based Access Control with Catalog Policies
After authentication, Polaris evaluates access policies using a role-based model. Policies are defined at the catalog level and can be inherited or overridden at lower levels. The policy hierarchy is:
- Catalog-level policies - Apply to all resources within the catalog unless overridden.
- Namespace-level policies - Apply to all resources within the namespace.
- Table-level policies - Apply to a specific table or view.
- Column-level policies - Apply to specific columns within a table (for column-level access control).
Policy Structure: Principals, Actions, and Resources
Each policy is a tuple of (principal, action, resource, effect). The principal is the authenticated identity. The action is the operation being requested: CREATE_TABLE, DROP_TABLE, READ_DATA, WRITE_DATA, ALTER_TABLE, LIST_TABLES, or DESCRIBE_TABLE. The resource is the target object: a catalog, a namespace, a table, or a column. The effect is either ALLOW or DENY.
Policy evaluation follows a deny-first rule: if any matching policy has DENY as its effect, the request is denied. If no matching policy exists, the default is also deny. This means Polaris uses an explicit allow model: access is granted only if a policy explicitly allows it. This is a safer default than an implicit allow model where missing policies grant access.
Role Definitions and Principal Assignment
Roles in Polaris are named collections of policies. Instead of assigning policies directly to principals, Polaris assigns roles to principals. This simplifies management: a data_analyst role might grant READ_DATA on all tables in the marketing catalog, and all analysts in the department receive this role. When an analyst changes teams or leaves, only the role assignment changes, not the policies themselves.
Roles can be scoped to a catalog or global. A catalog-scoped role only applies within that catalog. A global role applies across all catalogs unless explicitly restricted. Global roles are useful for principals that need cross-department access, such as platform administrators or compliance auditors. However, global roles should be used sparingly because they bypass the catalog isolation boundary.
Namespace Isolation: The Self-Contained Unit of Metadata
Namespaces in Polaris are analogous to databases in a relational system. They group related tables and provide a naming scope. But unlike traditional databases, namespaces in Polaris are also security boundaries. A principal with access to a namespace can list its tables, but cannot see tables in sibling namespaces unless explicitly granted.
Namespace Creation and Ownership
When a catalog is created, it contains a default namespace. Additional namespaces are created by principals with the CREATE_NAMESPACE action on the parent namespace or catalog. The creator becomes the namespace owner and receives implicit full access to the namespace. Ownership can be transferred by updating the namespace's owner policy.
Namespaces support a parent-child hierarchy. A namespace marketing.campaigns is a child of marketing. The child namespace inherits the parent's policies unless explicitly overridden. This inheritance model allows administrators to set department-level policies at the marketing namespace and have them automatically apply to all campaign-specific namespaces below it. If a specific campaign needs tighter access, the policy is overridden at the child namespace without affecting the parent or siblings.
Table Registration and the Namespace Lock
When a table is registered in Polaris, it is bound to a namespace. The table's metadata location (the S3 path or HDFS path) is recorded in the catalog, and the namespace provides the naming scope. If two tables in the same namespace have the same name, the registration fails. But two tables in different namespaces can have the same name without conflict, because the fully qualified name includes the namespace.
The namespace also provides a soft lock on table operations. If a principal has READ_DATA on the namespace but not on a specific table, the table is still visible in the namespace listing. However, the principal cannot read the table's data. This is different from a hard lock where the table is completely hidden. Polaris uses a soft lock model by default, which allows users to discover tables even if they cannot access them. This is useful for data discovery workflows where analysts need to know what exists before requesting access.
The Access Evaluation Flow: A Step-by-Step Walkthrough
When a query engine like Spark requests a table from Polaris, the access evaluation follows a precise sequence:
- Authentication - Spark presents its credentials (OAuth token, service principal, or trusted header). Polaris validates the credentials and extracts the principal identity.
-
Catalog lookup - Spark requests a table in a specific catalog. Polaris verifies the catalog exists and the principal has access to the catalog (at minimum,
LIST_TABLESorREAD_DATAon the catalog). - Namespace lookup - Polaris resolves the namespace path from the table name. It checks the principal has access to the namespace. If the namespace does not exist, the request fails.
- Table lookup - Polaris checks the table exists in the namespace. If the table is not registered, the request fails.
-
Policy evaluation - Polaris evaluates all policies that match the
(principal, action, resource)tuple. The action is determined by the query type: aSELECTquery requiresREAD_DATA, anINSERTrequiresWRITE_DATA, anALTER TABLErequiresALTER_TABLE. -
Effect determination - If any matching policy has
DENY, the request is denied. If no matching policy hasALLOW, the request is denied. If at least one matching policy hasALLOWand no matching policy hasDENY, the request is allowed. - Metadata return - If allowed, Polaris returns the table metadata (schema, partition spec, current snapshot) to Spark. Spark then uses this metadata to read or write data directly from the storage layer (S3, HDFS).
The critical observation is that Polaris does not read or write the actual table data. It only evaluates access and returns metadata. The query engine is responsible for reading Parquet files from S3 using the metadata Polaris provided. This means the storage layer must also enforce access if the query engine bypasses Polaris. Polaris and the storage layer should be configured with matching policies, or the storage layer should trust the query engine (which is typical for cloud deployments where the engine runs with a service role that has storage access).
Integration with External Policy Engines
For organizations that already use a centralized policy engine like Apache Ranger or AWS IAM, Polaris provides a plugin interface for external authorization. Instead of evaluating policies internally, Polaris can delegate the access decision to an external service. The plugin receives the same (principal, action, resource) tuple and returns ALLOW, DENY, or ABSTAIN.
The Delegation Model
When an external policy engine is configured, Polaris evaluates its internal policies first. If the internal policy is DENY, the request is denied immediately. If the internal policy is ALLOW, Polaris still delegates to the external engine for confirmation. If the external engine returns DENY, the request is denied. If the external engine returns ABSTAIN (no policy found), Polaris falls back to the internal policy. This layered model allows Polaris to act as a cache of common policies while delegating complex or dynamic policies to the external engine.
The external engine must respond within a configurable timeout (default 5 seconds). If the engine times out, Polaris treats the response as ABSTAIN and falls back to internal policies. This prevents Polaris from hanging on a slow external engine while still maintaining security through the internal policy layer.
Common Pitfalls and Security Edge Cases
The Storage Layer Bypass Problem
Polaris controls metadata access but does not control data access. A user who obtains a table's metadata (which includes the S3 path) can read the data directly from S3 if they have IAM permissions. This is a common misunderstanding: Polaris is a catalog, not a storage security layer. To prevent bypass, the storage layer should be configured with IAM policies that restrict access to the query engine's service role, and users should not have direct S3 access. Alternatively, Polaris can be configured with signed URLs that expire after a short duration, giving the query engine temporary access without exposing long-lived credentials.
The Role Inheritance Trap
Because roles can be scoped at the catalog, namespace, or table level, it is possible to create a situation where a principal has a role at the catalog level that grants broad access, and a second role at the table level that denies specific access. The deny-first rule means the table-level denial wins. But operators sometimes forget that namespace-level roles also apply. If a principal has READ_DATA at the catalog level and no policy at the table level, the catalog-level role grants access. To truly restrict a table, the operator must add a DENY policy at the table level or remove the catalog-level role.
The Default Namespace Visibility
The default namespace in a new catalog is visible to all authenticated principals. This is a convenience for getting started, but it can be a security issue if sensitive tables are accidentally created in the default namespace. The best practice is to create a dedicated namespace for each workload or team and avoid using the default namespace for anything except catalog-level metadata or templates.
Token Expiration and Session Management
Polaris does not maintain session state for OAuth tokens. Each request is validated independently by checking the token signature and claims. This means Polaris does not have a session table that can be invalidated if a user is deactivated. If an identity provider deactivates a user but the user's token is still valid, Polaris will continue to accept the token until it expires. The maximum token lifetime should be set short enough (e.g., 1 hour) that deactivation takes effect within a reasonable window. For service principals, the client secret should be rotated regularly, and Polaris should be configured to reject old secrets.
Conclusion
Apache Polaris provides a unified catalog and security layer for Iceberg tables. Its catalog-first design isolates tenants at the namespace level, while its role-based access control provides fine-grained permissions on tables and views. The authentication layer supports OAuth, service principals, and trusted proxies, making it adaptable to different deployment environments.
Production deployments should pay attention to the storage layer bypass problem: Polaris controls metadata, not data. The access policy hierarchy is deny-first, meaning explicit denials override broad allows. Namespace inheritance simplifies policy management but can create unexpected access if operators are not careful about the scoping of roles. Token lifecycle management is the operator's responsibility - Polaris validates tokens but does not manage their issuance or revocation.
For organizations adopting Iceberg, Polaris offers a path to unified metadata governance without vendor lock-in. The security model is explicit, hierarchical, and auditable. But it requires operators to understand the interaction between metadata access and storage access, and to configure both layers consistently for a complete security posture.
About the author: I'm Prithvi S, Staff Software Engineer at Cloudera and Opensource Enthusiast. I contribute to Apache Lucene, OpenSearch, and related projects. Follow my work on GitHub.








