kroxylicious.github.io/performance.markdown at main · kroxylicious/kroxylicious.github.io

layout	overview
title	Performance
permalink	/performance/
toc	true

This page summarises the measured performance overhead of Kroxylicious. Numbers come from benchmarks run on real hardware using OpenMessaging Benchmark (OMB), an industry-standard Kafka performance tool. No, we didn't run this on a laptop — it's a realistic deployment: an 8-node OpenShift cluster on Fyre (5 workers, 3 masters), IBM's internal cloud platform — a controlled environment.

Test environment

Component	Details
CPU	AMD EPYC-Rome, 2 GHz
Cluster	8-node OpenShift (5 workers, 3 masters), RHCOS 9.6
Kafka	3-broker Strimzi cluster, replication factor 3
Kroxylicious	0.20.0, single proxy pod, 1000m CPU limit
KMS	HashiCorp Vault (in-cluster)

All primary results used 1 KB messages on a single partition. Multi-topic workloads (10 and 100 topics) confirmed that overhead characteristics hold when load is distributed.

Passthrough proxy (no filters)

The proxy layer itself adds negligible overhead. At sub-saturation rates the additional latency is sub-millisecond on average, with no measurable throughput impact.

10 topics, 1 KB messages (5,000 msg/s per topic):

Metric	Baseline	Proxy	Delta
Publish latency avg	2.62 ms	2.79 ms	+0.17 ms (+7%)
Publish latency p99	14.09 ms	15.17 ms	+1.08 ms (+8%)
E2E latency avg	94.87 ms	95.34 ms	+0.47 ms (+0.5%)
Publish rate	5,002 msg/s	5,002 msg/s	no change

100 topics, 1 KB messages (500 msg/s per topic):

Metric	Baseline	Proxy	Delta
Publish latency avg	2.66 ms	2.82 ms	+0.16 ms (+6%)
Publish latency p99	5.54 ms	6.07 ms	+0.53 ms (+10%)
Publish rate	500 msg/s	500 msg/s	no change

Record encryption (AES-256-GCM)

Encryption adds measurable but predictable overhead. The cost scales with producer rate — well below saturation the overhead is small; approaching the saturation point, latency rises sharply.

Latency at sub-saturation rates

1 topic, 1 KB messages — baseline vs encryption:

Rate	Metric	Baseline	Encryption	Delta
34,000 msg/s	Publish avg	8.00 ms	8.19 ms	+0.19 ms (+2%)
34,000 msg/s	Publish p99	48.65 ms	64.01 ms	+15.35 ms (+32%)
36,000 msg/s	Publish avg	9.38 ms	10.46 ms	+1.08 ms (+12%)
36,000 msg/s	Publish p99	63.92 ms	88.98 ms	+25.06 ms (+39%)
37,200 msg/s	Publish avg	9.12 ms	12.19 ms	+3.07 ms (+34%)
37,200 msg/s	Publish p99	74.88 ms	113.15 ms	+38.27 ms (+51%)

Throughput ceiling

Scenario	Throughput ceiling (1 topic, 1 KB, 1 partition)
Baseline (direct Kafka)	~19,400 msg/s
Encryption (proxy + AES-256-GCM)	~14,600 msg/s
Cost	~25% fewer messages per second per partition

Sizing guidance

Numbers without guidance aren't very useful, so here's how to translate these results into pod specs.

Passthrough proxy: size your Kafka cluster as you normally would. The proxy will not be the bottleneck.

With record encryption:

Throughput: use CPU (mc) = 10 × total proxy throughput (MB/s) where total = produce MB/s + each consumer group's consume MB/s. For 1:1 produce:consume this simplifies to 20 × produce MB/s. Add ×1.3 headroom. Measured on AMD EPYC-Rome 2 GHz with AES-NI — calibrate on your hardware. Validated at 1000m, 2000m, and 4000m. Example: 100k msg/s at 1 KB, 1 consumer group = 200 MB/s total → 2000m + headroom → ~2600m.
Latency: expect 0.2–3 ms additional average publish latency and 15–40 ms additional p99, scaling with how close to saturation you operate
Scaling: set requests equal to limits in your pod spec to make the CPU budget — and therefore the throughput ceiling — deterministic. Increase the CPU limit to raise throughput; add proxy pods for redundancy.
KMS: DEK caching means the KMS is not on the hot path. In testing, each benchmark run triggered only 5–19 DEK generation calls — the KMS is not a bottleneck

Caveats

These numbers come from a single proxy pod, 1 KB messages, and single-pass measurements. A few things that matter when applying them to your workload:

Message size: the sizing coefficient is message-size-dependent — encryption overhead as a percentage is likely lower for larger messages
Replication factor: the 1-topic latency and ceiling results ran at RF=3; at that replication factor Kafka's ISR replication creates a per-partition ceiling that sits close to where proxy CPU saturates. The sizing coefficient was derived from RF=1 multi-topic workloads to isolate proxy CPU
Horizontal scaling: linear scaling has been validated across CPU allocations on a single pod; multi-pod scaling hasn't been measured but is expected to follow the same coefficient

The engineering post has the full methodology detail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test environment

Passthrough proxy (no filters)

Record encryption (AES-256-GCM)

Latency at sub-saturation rates

Throughput ceiling

Sizing guidance

Caveats

Further reading

FilesExpand file tree

performance.markdown

Latest commit

History

performance.markdown

File metadata and controls

Test environment

Passthrough proxy (no filters)

Record encryption (AES-256-GCM)

Latency at sub-saturation rates

Throughput ceiling

Sizing guidance

Caveats

Further reading