Quarkus Insights #252: Performance Engineering Lessons from the Trenches

This summary was generated using AI, reviewed by humans - watch the video for the full story.

Quarkus Insights #252: Performance Engineering Lessons from the Trenches

Episode 252 of Quarkus Insights featured Francesco Nigro, a performance engineer on the Red Hat App Services performance team, who shared a concentrated set of lessons drawn from years of low-level profiling and benchmarking work — most recently in the context of the Quarkus vs Spring Boot performance benchmark.

Performance as a Currency

Francesco opened with a framing device borrowed from economics: performance as currency. The idea, attributed to a MIT professor’s lecture, is that performance has no intrinsic value in the same way a feature does, but it has high exchange value — you spend performance to buy features, user experience, and other desirable properties.

The analogy holds because scarcity matters. If you spend more performance than your budget allows, the features you have already paid for stop working. The system becomes unusable.

This matters more now than it did decades ago. Moore’s Law delivered free performance gains for a long time, but the era of doubling single-core performance every year is over. At the same time, modern deployments add layers of virtualization, containerization, and cloud abstraction that erode the available budget before an application even starts. Francesco argued that this is precisely when caring about performance stops being optional.

Why Practice Matters More Than Theory

Francesco recommended learning from practitioners rather than textbooks and attributed this advice to async-profiler author Andrei Panin: theoretical knowledge fades unless applied immediately to real tasks. He pointed to the surge of practical investigation that followed publication of the Quarkus benchmark as a concrete example. Problems are the curriculum.

Lesson 1: Good Data Can Come from Anywhere

A recurring theme was openness to external data. Francesco admitted he tends to dismiss "on my machine I got different numbers" reports — but learned not to. When contributors from outside the Quarkus community, including people from competing projects, shared benchmark results from different hardware, those results revealed real bottlenecks and led to genuine improvements.

The takeaway: reproducible benchmarks invite scrutiny from all directions, and that scrutiny is valuable.

Lesson 2: Local Benchmarking is Hard — AI Makes it Harder

Running benchmarks on a local machine is unreliable. Background processes compete for resources. Laptop power management distorts results. These problems are well known.

Francesco noted that AI tools add a new dimension of difficulty. He opened a bug report against Anthropic after observing that Claude Code consumed around 30% of CPU while idle, acting as a noisy neighbor that polluted measurements. When he asked Claude to isolate itself from the benchmark, it did — but also isolated the benchmark into the same CPU affinity group, causing everything to run on a single core. He only noticed three hours into the run.

The lesson is not that AI tools are useless for performance work. It is that they must be used carefully, and any result they influence must be independently validated.

Lesson 3: Methodology — Active Benchmarking and the USE Method

Francesco described two methodologies from Brendan Gregg:

Active benchmarking means measuring resource consumption while the benchmark runs, not just reading the final score. Every run is potentially unrepeatable, so capturing the full resource picture provides context for interpreting numbers and detecting interference.

The USE Method — Utilization, Saturation, Errors — provides a structured checklist for each resource:

  • Which resource is most utilized?

  • Is any resource saturated, causing queuing?

  • Are there errors?

The errors check is the simplest and the most ignored. Before reading any benchmark result, the first question should be whether the framework under test, the load generator, or the database was throwing errors during the run. An error-laden benchmark measures error handling, not the thing you intended to measure.

Lesson 4: Data Quality and Precise Language

Three related points on how to handle data honestly:

Use precise language. Imprecise notes about experiments create false impressions of causation. "A implies B" is a claim; "A correlates with B under these conditions" is a much more careful one. This matters doubly when working with LLMs, which are especially prone to drawing causation from correlation and presenting it confidently.

Distinguish data from information. Raw benchmark output is data. Information is the conclusion extracted from that data through a documented process. When reading someone else’s benchmark, ask whether they are sharing data or the already-processed conclusion — and look for the process in between.

Avoid anchoring bias. When only a few samples fit your hypothesis perfectly, do not select only those samples to validate it. Outliers that contradict a hypothesis deserve investigation, not dismissal.

On statistics: Francesco argued for reducing noise in experiments first, so that statistics become less necessary rather than more. This is especially practical for software engineers who need results quickly. When the environment is noisy by nature, some statistics are unavoidable — but controlling the environment is always worth attempting first.

Lesson 5: Communicate with Humans, Be Wary of LLMs

The mindset section addressed the risk of retreating into an "ivory tower" where an LLM becomes the only sounding board. LLMs agree readily, even when prompted with incorrect positions. They produce confident-sounding explanations linking unrelated data points.

The value of talking with other people — especially non-experts in the specific area — is the friction. Having to explain a finding to someone who does not already share your assumptions forces clarity and surfaces gaps. Francesco illustrated this with a story about sending a draft blog post to a colleague and being told that an explanation did not make sense. It turned out the section really did not make sense.

LLMs can still be useful for getting started in unfamiliar territory, but the ownership of the understanding must remain with the engineer. The responsibility for conclusions cannot be delegated.

Lesson 6: Bugs, Luck, and Doing Your Homework

The final section addressed mindset around root cause analysis:

Avoid blame without proof. Early in his career, Francesco’s instinct was to blame the operating system, the JVM, or the hardware. He learned to treat that as a last resort requiring evidence. JVM bugs and Linux kernel bugs do exist, but they are uncommon. Claiming one without a reproduction is not useful.

Luck is real but not the whole story. Some discoveries happen because the right conditions align unexpectedly. Francesco acknowledged that finding three significant HotSpot bugs in a short time involved luck — but also deep preparation that made it possible to recognize what he was looking at when the opportunity arose.

Do your homework. The investigation chain leading to a root cause can be very long. Stopping early because you have run out of familiar territory is tempting. Going deeper, and reaching out to specialists in adjacent areas — JVM teams, OS kernel developers — is how compound discoveries happen.

Simplicity vs. Performance

A question from the audience raised the classic trade-off: is it ever right to sacrifice simplicity for performance? Francesco’s answer was architectural: good performance should be designed in from the start, not patched in later. Consider insects, which are both simple and highly effective — Franz used this as an illustration that simplicity and efficiency are not inherently opposed.

When performance is designed in, the result can be both clean and fast. When it is patched in later, the ugliness is real, but the fault lies in the initial design, not in the act of optimizing. The Quarkus build-time principle was cited as an example where internal complexity — the "horrendous code" on the Quarkus side — produces a clean, fast surface for developers.

A Note on Tracking Performance Over Time

The episode closed with a discussion about the practicality of ongoing performance tracking. Both Eric Deandrea and Holly Cummins noted that the team has internal historical performance data and is planning to publish it publicly. The goal is to give the community visibility into trends and the ability to flag regressions — the "boiling frog" problem where gradual decline goes unnoticed.

A longer-term ambition is to make it easier for individual library maintainers to track simple metrics — binary size, memory usage under a test — in CI pipelines without needing full benchmark infrastructure.

Key Takeaways

  1. Performance is a budget that gets spent buying features; when the budget runs out, features stop working.

  2. The free lunch from Moore’s Law is over — caring about performance is more important now than it was thirty years ago.

  3. Good data can come from competitors — be open to external benchmarks even when the source is uncomfortable.

  4. AI tools are noisy neighbors — measure around them carefully and validate everything they influence.

  5. Active benchmarking means measuring resources, not just scores — a score without context is meaningless.

  6. Always check the error logs first before interpreting any benchmark result.

  7. Distinguish data from information — document the process that converts one into the other.

  8. Reduce noise before reaching for statistics — controlling the environment is more practical than computing p-values for most engineers.

  9. Human communication has a friction that LLMs do not — that friction is a feature, not a bug.

  10. Real discoveries require doing the homework — luck matters, but preparation determines whether you recognize what you found.

Conclusion

This episode was less a product update and more a philosophy session from someone who has spent years finding the kinds of bugs that affect everyone, working deep in the stack where most engineers never venture. Francesco’s lessons are practical and hard-won: be skeptical of easy explanations, instrument everything, talk to people who will push back, and never stop before you have actually understood the root cause.

For developers using Quarkus, the practical encouragement is to download the benchmark, run it, and start building intuition for what real performance investigation looks like.

Watch the full episode on the Quarkus YouTube channel. The benchmark is available at github.com/quarkusio/spring-quarkus-perf-comparison.