Apache Polaris is powered by Quarkus
Clément Escoffier clement@redhat.com - JB Onofré jbonofre@apache.org
Apache Polaris, an open-source catalog for Apache Iceberg has officially graduated to become a Top-Level Project (TLP), capping its incubation with a pivotal technical decision: adopting Quarkus as its cloud-native foundation. The fusion of an Iceberg catalog with Supersonic Subatomic Java delivers not only cloud-native efficiency and extensibility but also community collaboration, positioning Polaris as the leading solution for multi-engine data lakehouse architectures.
The Journey to Top-Level Project
Apache Polaris has achieved a major milestone: graduating from the Apache Incubator to become a new Top-Level Project (TLP) within the Apache Software Foundation (ASF). This graduation signifies not only the maturity and community adoption of the project but also its successful integration into the Apache Way of governance and operation.
Understanding the Apache Incubator
The Apache Incubator is the entry point for projects, known as "podlings," seeking to join the ASF. It guides them through the adoption of meritocratic, consensus-based governance, with experienced mentors acting as liaisons between the podling and the foundation’s teams. Polaris successfully completed this mentored incubation process, demonstrating both community maturity and operational readiness.
Technical Evolution During Incubation
For Polaris, the incubation period was not merely an exercise in governance; it also spurred significant technical development and strategic pivots. The project demonstrated its adaptability and commitment to robust architecture by implementing several important changes, including:
-
A New JDBC Persistence Layer: Enhancing the reliability and flexibility of catalog metadata storage and configuration management.
-
Advanced Security Features: Integrating new security capabilities, notably through the implementation of the Open Policy Agent (OPA) for policy-based catalog access control, and a sophisticated authorizer framework.
-
Robust Policy Management: Developing comprehensive frameworks for defining and enforcing catalog access policies and permissions, a core capability for multi-tenant catalog deployments.
Crucially, the team recognized that a successful TLP required a future-proof technical core, which led to one of its most pivotal architectural decisions: adopting Quarkus.
Quarkus Adoption
Among these technical advancements, one of the most pivotal and forward-looking decisions made during the incubation phase was the adoption of Quarkus to power Apache Polaris. This strategic shift recognized that a modern Iceberg catalog serving multiple query engines requires more than traditional Java frameworks could efficiently deliver. The team needed a technology stack that could meet the demanding requirements of cloud-native deployments while maintaining the developer productivity essential for rapid innovation.
Quarkus represents a fundamental shift in how Java applications are built for modern cloud environments. For Apache Polaris, an Iceberg catalog designed to serve multiple query engines, Quarkus provides the ideal combination of performance, developer productivity, and architectural flexibility.
Container-First Architecture
Quarkus was designed from the ground up with a container-first philosophy. Unlike traditional frameworks that were retrofitted for containers, Quarkus optimizes applications specifically for containerized environments by shifting heavy computational work, such as classpath scanning, configuration loading, and dependency injection, from runtime to build time.
For Polaris, this means:
-
Faster Horizontal Scaling: When deploying additional catalog instances to handle traffic spikes, containers start in milliseconds rather than seconds.
-
Higher Density: Lower memory footprint allows more Polaris instances per node, reducing infrastructure costs.
-
Efficient Resource Use: Minimal CPU and memory consumption directly lowers operational cloud costs and provides sustainability benefits by reducing resource consumption.
Performance Characteristics
Quarkus delivers performance through four key dimensions:
-
Fast Startup Times: By performing application initialization work at build time, Quarkus enables Polaris to start rapidly, critical for auto-scaling scenarios where catalog instances must spin up quickly to handle bursts of metadata requests from Spark, Trino, Flink, or other query engines. To give some numbers, canonical Quarkus applications start in 10ms in native mode, in ~100ms using AOT (Leyden), and in 1 to 3s in plain JVM mode.
-
Reduced Memory Footprint: The reactive core uses a small number of event loops instead of maintaining large thread pools, dramatically reducing memory usage and enabling higher deployment density.
-
High Throughput: The reactive, non-blocking engine based on Netty and Eclipse Vert.x enables Polaris to efficiently process numerous concurrent catalog operations, minimizing blocking during I/O to metadata stores or object storage. This non-blocking, reactive architecture allows Polaris to handle thousands of concurrent client connections, such as metadata requests from Spark or Trino, without dedicating a thread per request, thus avoiding thread-contention bottlenecks.
-
Optimized Resource Consumption: The combination of build-time optimization and reactive architecture results in lower CPU requirements, directly reducing cloud costs for organizations running Polaris at scale.
Developer Productivity and Versatility
Quarkus provides a cohesive, full-stack framework with a rich ecosystem of extensions, enabling the Polaris community to focus on catalog features rather than assembling infrastructure plumbing.
Frictionless Development Experience: The live coding feature provides instant feedback during development, dramatically accelerating the inner development loop and enabling faster iteration on catalog capabilities and security enhancements.
Programming Model Flexibility: Quarkus supports three execution approaches: plain imperative code, reactive programming using the Mutiny API, and virtual threads (JDK 21+), allowing developers to choose the right model for each component. This versatility means Polaris developers can use familiar imperative patterns for simple management endpoints (such as health checks) while fully leveraging the reactive/non-blocking approach (with Mutiny or virtual threads) for high-concurrency, I/O-intensive catalog operations (such as metadata lookups and commits).
Protocol Diversity: Native support for REST, gRPC, GraphQL, and message brokers (Kafka, RabbitMQ, Pulsar, Apache ActiveMQ) gives Polaris flexibility in how it exposes catalog APIs and integrates with the broader data ecosystem.
The Power of the Quarkiverse Ecosystem
Beyond the core framework, Quarkus offers access to the Quarkiverse, an ecosystem of hundreds of community-driven and vendor-supported extensions. Rather than building custom integrations, the Polaris team leverages battle-tested extensions for persistence (PostgreSQL, MongoDB), security (OIDC, OPA), and observability (OpenTelemetry, Micrometer), as the next section illustrates with concrete examples from the codebase.
How Polaris Leverages Quarkus
The true power of the Quarkus foundation becomes evident when examining how Apache Polaris leverages specific Quarkus features in its implementation. The catalog service integrates over 15 specialized extensions to deliver a production-ready, cloud-native Iceberg catalog. Here are a few examples of integration.
Reactive Request Processing with Quarkus REST
Polaris implements the Iceberg REST API using Quarkus REST, leveraging the Quarkus Reactive Core and SmallRye Mutiny to process thousands of concurrent metadata requests asynchronously without blocking threads.
Here’s the actual realm context resolver from the Polaris codebase:
@ServerRequestFilter(preMatching = true)
public Uni<Response> resolveRealmContext(ContainerRequestContext rc) {
return realmContextResolver.resolveRealmContext(
rc.getUriInfo().getRequestUri().toString(),
rc.getMethod(),
rc.getUriInfo().getPath(),
rc.getHeaders()::getFirst)
.invoke(realmContext -> realmContextHolder.set(realmContext))
.invoke(realmContext -> ContextLocals.put(REALM_CONTEXT_KEY, realmContext));
}
With just a few lines of reactive code, Polaris achieves multi-tenant request routing that would typically require significantly more boilerplate in traditional frameworks.
Resilience with SmallRye Fault Tolerance
For critical operations such as persisting catalog events, Polaris uses Quarkus Fault Tolerance (which relies on SmallRye Fault Tolerance) to ensure reliability. Here is an example of how Polaris implements a resilient flush operation to the metadata store:
@Retry(maxRetries = 5, delay = 1000, jitter = 100)
@Fallback(fallbackMethod = "onFlushError")
protected void flush(String realmId, List<PolarisEvent> events) {
var metaStoreManager =
metaStoreManagerFactory.getOrCreateMetaStoreManager(realmContext);
metaStoreManager.writeEvents(callContext, events);
}
This declarative approach to resilience means Polaris automatically retries failed writes to the metadata store, with configurable backoff and jitter to prevent thundering herd problems. Note that all the configured values can be overridden in the application configuration.
Using CDI beans as extension points: selecting the persistence layer
Polaris supports multiple persistence backends (PostgreSQL via the Quarkus JDBC PostgreSQL extension, MongoDB via the Quarkus MongoDB Client extension, and in-memory for testing) using Quarkus’s CDI qualifier mechanism:
@Produces
@ApplicationScoped
Backend backend(BackendConfiguration config, @Any Instance<BackendBuilder> builders) {
var backendName = config.backend()
.orElseThrow(() -> new IllegalStateException(
"Configuration polaris.persistence.nosql.backend is missing!")
);
var builder = builders.select(BackendType.Literal.of(backendName));
return builder.get().buildBackend();
}
This architecture allows operators to deploy Polaris with the persistence layer that matches their infrastructure requirements, simply by changing the polaris.persistence.nosql.backend configuration property.
Comprehensive Observability
Polaris leverages Quarkus Micrometer with Prometheus registry and Quarkus OpenTelemetry for complete observability. The catalog automatically tracks:
-
HTTP metrics for all Iceberg REST API requests, tagged by realm, operation, and status code
-
Custom gauges and counters for catalog-specific metrics like table count, namespace count, and metadata operation latency
-
Distributed traces that follow requests across catalog operations, from initial metadata request through to storage access
The metrics endpoint is exposed on a separate management port (8182), isolated from the main API port (8181), following Kubernetes best practices:
quarkus.http.port=8181
quarkus.management.enabled=true
quarkus.management.port=8182
quarkus.micrometer.enabled=true
quarkus.micrometer.export.prometheus.enabled=true
Collaboration between Quarkus and Polaris Communities
The adoption of Quarkus has created a productive feedback loop between the two communities. Engineers from both projects have worked closely together to ensure Quarkus is used effectively within Polaris, reporting bugs and identifying possible enhancements along the way. This hands-on collaboration benefits both sides: Polaris gets a solid, well-understood foundation, while Quarkus gains real-world feedback from a demanding, high-throughput Apache TLP deployment, driving stability and performance improvements that benefit the entire Quarkus ecosystem.
Looking Ahead
The successful graduation of Apache Polaris as a Top-Level Project, combined with the power and efficiency gained from its foundation on Quarkus, is merely the launchpad for a significant, forward-looking technical agenda. Polaris is positioned not just for growth, but to actively shape the future of cloud-native data lakehouse architectures.
The community is already aligning its development with the cutting edge of the Quarkus roadmap, including the forthcoming Quarkus 4 major release. This alignment guarantees that Polaris will continually benefit from performance gains, new cloud-native APIs, and the most advanced capabilities for its reactive core.
Our vision extends to embracing new networking and security standards to meet the demands of the most critical deployments:
-
Next-Generation Networking: We are exploring the adoption of HTTP/3, leveraging its multiplexing and reduced head-of-line blocking to further minimize latency and boost the throughput of metadata requests, which is crucial for query engines managing thousands of concurrent operations.
-
Future-Proof Security: Looking beyond today’s standards, the project is investigating quantum-safe TLS and encryption. By integrating modern cryptographic practices, Polaris aims to provide a catalog that is secure not just for the present, but against the computational challenges of the future.
This commitment to continuous innovation, driven by an active community, ensures that Apache Polaris is ready to become the industry standard for high-performance, secure, and multi-engine Iceberg catalog interoperability. We welcome contributors, users, and organizations to join us in defining this next era of the data ecosystem.