perf(core): Lazily allocate AutoClosableReentrantLock (JAVA-588)#5643
perf(core): Lazily allocate AutoClosableReentrantLock (JAVA-588)#5643runningcode wants to merge 2 commits into
Conversation
📲 Install BuildsAndroid
|
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 1edbdfa | 364.77 ms | 450.29 ms | 85.52 ms |
| a416a65 | 333.78 ms | 410.37 ms | 76.59 ms |
| d15471f | 302.62 ms | 353.84 ms | 51.22 ms |
| 22f4345 | 314.79 ms | 375.02 ms | 60.23 ms |
| 6b019b7 | 343.31 ms | 417.23 ms | 73.91 ms |
| 22f4345 | 312.78 ms | 347.40 ms | 34.62 ms |
| d217708 | 411.22 ms | 430.86 ms | 19.63 ms |
| 319f256 | 315.96 ms | 372.96 ms | 57.00 ms |
| e2dce0b | 308.96 ms | 360.10 ms | 51.14 ms |
| 8558cac | 306.16 ms | 355.24 ms | 49.09 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 1edbdfa | 1.58 MiB | 2.20 MiB | 635.34 KiB |
| a416a65 | 1.58 MiB | 2.12 MiB | 555.26 KiB |
| d15471f | 1.58 MiB | 2.13 MiB | 559.54 KiB |
| 22f4345 | 1.58 MiB | 2.29 MiB | 719.83 KiB |
| 6b019b7 | 0 B | 0 B | 0 B |
| 22f4345 | 1.58 MiB | 2.29 MiB | 719.83 KiB |
| d217708 | 1.58 MiB | 2.10 MiB | 532.97 KiB |
| 319f256 | 1.58 MiB | 2.19 MiB | 619.79 KiB |
| e2dce0b | 0 B | 0 B | 0 B |
| 8558cac | 0 B | 0 B | 0 B |
AutoClosableReentrantLock extended ReentrantLock, so every SDK object holding one allocated a ReentrantLock (and its AbstractQueuedSynchronizer) eagerly in its field initializer. A customer Perfetto trace showed ~81 such allocations on the main thread during SentryAndroid.init, many for locks that are never acquired during init. Hold the ReentrantLock internally and create it lazily on first acquire(), using an AtomicReferenceFieldUpdater CAS so creation stays atomic and Loom-friendly (no synchronized, preserving #3715). Every call site uses acquire() only, so dropping the ReentrantLock superclass touches no caller. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6ee6062 to
04091dd
Compare
| * was pure GC and main-thread overhead. We keep a {@link ReentrantLock} rather than reverting to | ||
| * {@code synchronized} to stay friendly to virtual threads (Loom), see #3715. | ||
| */ | ||
| public final class AutoClosableReentrantLock { |
There was a problem hiding this comment.
Bug: AutoClosableReentrantLock is no longer Serializable, which will cause a NotSerializableException when serializing classes that use it, like SynchronizedCollection.
Severity: MEDIUM
Suggested Fix
Make AutoClosableReentrantLock implement Serializable. Since its internal state (lock) is a ReentrantLock which is already serializable, this should be a straightforward change. Alternatively, mark the lock field in SynchronizedCollection and SynchronizedQueue as transient and provide custom writeObject/readObject methods to handle serialization correctly, if the lock state is not intended to be serialized.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: sentry/src/main/java/io/sentry/util/AutoClosableReentrantLock.java#L20
Potential issue: The `AutoClosableReentrantLock` class no longer extends `ReentrantLock`
and does not implement `Serializable`. Consequently, classes like
`SynchronizedCollection` and `SynchronizedQueue`, which implement `Serializable` and
contain a non-transient `AutoClosableReentrantLock` field, will fail to serialize. Any
attempt by an SDK user to serialize an object containing one of these collections, such
as a `Scope` instance, will result in a `NotSerializableException` at runtime. This is a
binary compatibility break that was overlooked.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Is this an issue on the JVM? Definitely not needed for Android.
📜 Description
io.sentry.util.AutoClosableReentrantLockextendedjava.util.concurrent.locks.ReentrantLock, and ~57 SDK classes create one eagerly in a field initializer. Constructing the SDK object graph therefore allocated aReentrantLock(plus itsAbstractQueuedSynchronizerSync) per object — even for objects whose lock is never acquired.This change holds the
ReentrantLockinternally and creates it lazily on the firstacquire(), using anAtomicReferenceFieldUpdaterCAS so the lazy creation is atomic and stays Loom-friendly (nosynchronized, preserving the intent of #3715). The lifecycle token captures the resolved lock instance soclose()unlocks the correct one.💡 Motivation and Context
Part of the Reduce SDK init time [Android] effort (JAVA-588). A customer-provided Perfetto trace showed ~81
ReentrantLockallocations on the main thread underSentryAndroid.init, contributing GC pressure and main-thread CPU. With lazy allocation, only locks actually acquired during init allocate; the many never-contended locks no longer allocate at construction.Compatibility note: this is technically a binary-compatibility change —
AutoClosableReentrantLockno longer extendsReentrantLock, so the inheritedReentrantLockpublic methods leave its API surface (reflected insentry.api). A usage audit confirmed every call site uses onlyacquire()(try-with-resources); nothing callslock()/unlock()/tryLock()/isHeldByCurrentThread()directly or types a field/param asReentrantLock, so no caller is affected.An on-device A/B benchmark on a Pixel 3 (run locally, not included in this PR) quantified the allocation reduction and confirmed
acquire()hot-path throughput is unchanged — see results below.📈 Benchmark results (Pixel 3, Android 12)
Measured on a release build using ART instrumented method tracing, with slices counted in Perfetto
trace_processor(btrace app-tracing is broken on this device). The A/B was done by swapping onlyAutoClosableReentrantLockbetween the old (extends ReentrantLock) and new (lazy) implementations and rebuilding; everything else is identical.Real
SentryAndroid.init(cold) — lock allocationsReentrantLock(+AbstractQueuedSynchronizer) allocatedacquire()calls70 fewer
ReentrantLockallocations per init (140 fewer heap objects: 70 locks + 70Sync), a ~65% reduction. Counts were identical across 3 runs of each build. Both builds make the same 40acquire()calls, so the init code path is unchanged — only never-contended locks stop allocating (37 distinct locks are acquired during init vs 107 created). (This benchmark app allocates 107; the customer trace above showed ~81.)Construction cost (per lock): old ~73–78 ns/alloc, new ~60–66 ns/alloc → ~11.5 ns saved per allocation. Across the 70 avoided allocations that is ≈0.8 µs of direct allocation work per init — below the cold-start noise floor. The benefit is therefore reduced allocation/GC pressure, not a measurable init-latency win.
acquire()hot path: ~161 ns/op foracquire()+close()when the lock already exists — unchanged; the lazy path adds only a volatile read + null check after the first acquire.💚 How did you test it?
Unit tests in
AutoClosableReentrantLockTest(lazy creation, reentrancy, and an 8-thread × 1000-iteration mutual-exclusion stress test); full:sentry:testsuite green.📝 Checklist
sendDefaultPIIis enabled.🔮 Next steps
Biggest remaining main-thread init cost in the same trace is Manifest parsing (~147ms) — tracked separately (SDK-1322 / JAVA-531, compile-time injection).
Fixes JAVA-588