ref: split HangTracker into three-tier architecture#8179
Conversation
Introduce a layered design for hang detection: - RunLoopDelayTracker (renamed from DefaultHangTracker): raw ~25ms runloop polling via CFRunLoopObserver + semaphore - HangTracker (new middle layer): debounces delays into hangs with per-observer thresholds, notifies once on threshold crossing and once on hang end - WatchdogTerminationTrackingIntegration: now uses HangTracker.addObserver(threshold:) instead of manually filtering by duration
Gives the middle layer a distinct name so git sees the original HangTracker.swift → RunLoopDelayTracker.swift as a clean rename instead of a full rewrite.
📲 Install BuildsiOS
|
|
@sentry review |
Performance metrics 🚀
|
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 75eb224 | 1215.57 ms | 1244.74 ms | 29.17 ms |
| 4c437ea | 1241.63 ms | 1266.20 ms | 24.57 ms |
| c5c7a29 | 1223.14 ms | 1258.55 ms | 35.41 ms |
| ce1ec47 | 1212.60 ms | 1247.61 ms | 35.02 ms |
| 7b6ea60 | 1230.86 ms | 1250.91 ms | 20.05 ms |
| 1f2abc8 | 1209.18 ms | 1252.50 ms | 43.32 ms |
| 92fada5 | 1217.60 ms | 1252.89 ms | 35.30 ms |
| 5804f33 | 1218.24 ms | 1247.67 ms | 29.42 ms |
| 3efa7b5 | 1226.55 ms | 1260.66 ms | 34.11 ms |
| 72ea638 | 1224.78 ms | 1258.96 ms | 34.18 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 75eb224 | 24.14 KiB | 1.16 MiB | 1.13 MiB |
| 4c437ea | 24.14 KiB | 1.18 MiB | 1.15 MiB |
| c5c7a29 | 24.14 KiB | 1.15 MiB | 1.13 MiB |
| ce1ec47 | 24.14 KiB | 1.15 MiB | 1.13 MiB |
| 7b6ea60 | 24.14 KiB | 1.18 MiB | 1.15 MiB |
| 1f2abc8 | 24.14 KiB | 1.22 MiB | 1.20 MiB |
| 92fada5 | 24.14 KiB | 1.17 MiB | 1.15 MiB |
| 5804f33 | 24.14 KiB | 1.18 MiB | 1.16 MiB |
| 3efa7b5 | 24.14 KiB | 1.22 MiB | 1.19 MiB |
| 72ea638 | 24.14 KiB | 1.18 MiB | 1.15 MiB |
Previous results on branch: ref/hang-tracker-per-observer-threshold
Startup times
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 48cfd9f | 1222.32 ms | 1251.65 ms | 29.33 ms |
| 437a74e | 1221.90 ms | 1251.82 ms | 29.93 ms |
| 8a4624c | 1223.62 ms | 1247.70 ms | 24.08 ms |
| c0f787f | 1231.45 ms | 1266.51 ms | 35.06 ms |
| 843e9e7 | 1225.23 ms | 1267.65 ms | 42.42 ms |
App size
| Revision | Plain | With Sentry | Diff |
|---|---|---|---|
| 48cfd9f | 24.14 KiB | 1.22 MiB | 1.20 MiB |
| 437a74e | 24.14 KiB | 1.22 MiB | 1.20 MiB |
| 8a4624c | 24.14 KiB | 1.22 MiB | 1.20 MiB |
| c0f787f | 24.14 KiB | 1.21 MiB | 1.19 MiB |
| 843e9e7 | 24.14 KiB | 1.22 MiB | 1.20 MiB |
jamieQ
left a comment
There was a problem hiding this comment.
Some drive-by commentary for your consideration :)
Backport of Synchronization.Mutex API (iOS 18+) using os_unfair_lock for older deployment targets.
Document SentryMutex as preferred locking primitive for new Swift code. Expose isSessionPaused as computed property for test access after State struct migration.
Move sentrycrashbic callback registration back inside the mutex to prevent a race between start() and stop(), matching the original behavior.
This reverts commit 3bf7671.
Register callbacks outside the lock since sentrycrashbic_registerAddedCallback synchronously replays loaded images, each re-entering the lock. Unregistering stays inside the lock as it does not replay.
…observer-threshold
The threshold check used > which meant a delay exactly equal to the configured threshold would not trigger a notification.
AppHangTracker: mid-hang observer add/remove, re-add during hang, double-remove, stop/restart lifecycle, duration assertions. RunLoopDelayTracker: partial observer removal, observer added during active hang, double-remove same token.
Replace NSRecursiveLock with SentryMutex<T> for both trackers. The lock is non-recursive but safe here because handlers are called from the background queue while add/remove are main-queue-only, so re-entrance cannot occur.
The threshold check should use strict greater-than (>) to match the watchdog termination tracking behavior. A delay exactly equal to the threshold should not trigger a notification.
Replace MockAppHangTracker (which duplicated threshold logic) with a MockRunLoopDelayTracker injected into the real SentryDefaultAppHangTracker. This ensures tests exercise the actual production comparison logic instead of a mock copy.
…observer-threshold
Replace standalone SentryRunLoopDelayTrackerDependencies protocol with a typealias composing DateProviderProvider & ApplicationProvider. Update all tests to use MockDependencies instead of the removed dateProvider: init parameter.
Add getKeyWindow() to SentryApplication protocol and UIApplication extension. Rename RunLoopObserver to SentryRunLoopObserver. Update SentryDependencyContainer to pass self as dependencies. Fix whitespace and comment alignment in SentryApplication.swift.
Remove redundant Sentry prefix, now SentryDefaultRunLoopDelayTracker.
| if delay.duration > entry.threshold && !entry.hasBeenNotified { | ||
| entries[token]?.hasBeenNotified = true | ||
| entry.handler(.init(duration: delay.duration, state: .started)) | ||
| } | ||
| } else if entry.hasBeenNotified { | ||
| entries[token]?.hasBeenNotified = false | ||
| entry.handler(.init(duration: delay.duration, state: .ended)) | ||
| } | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Bug: The processDelay function calls observer handlers while holding a non-reentrant lock, creating a fragile design that is susceptible to deadlocks if handlers are modified in the future.
Severity: MEDIUM
Suggested Fix
To prevent potential deadlocks, avoid calling observer handlers while holding the lock. Consider collecting the handlers to be called, releasing the lock, and then executing the handlers. This removes the risk of a handler causing a re-entrant lock acquisition.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's
not valid.
Location: Sources/Swift/Integrations/AppHangTracking/SentryAppHangTracker.swift#L92-L106
Potential issue: In `SentryAppHangTracker`, the `processDelay` function calls observer
handlers while holding a non-reentrant `SentryMutex`. The underlying `os_unfair_lock`
will deadlock or panic if a handler attempts to re-acquire the lock by calling
`addObserver` or `removeObserver`. While the current handler implementation does not do
this, this design is fragile. A future change to an observer's handler, or the addition
of a new handler that calls back into the tracker, could easily introduce a deadlock.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0126e8f. Configure here.
| let (removed, isEmpty) = observers.withLock { | ||
| ($0.removeValue(forKey: token), $0.isEmpty) | ||
| } | ||
| _ = removed |
There was a problem hiding this comment.
| let (removed, isEmpty) = observers.withLock { | |
| ($0.removeValue(forKey: token), $0.isEmpty) | |
| } | |
| _ = removed | |
| let (_, isEmpty) = observers.withLock { | |
| ($0.removeValue(forKey: token), $0.isEmpty) | |
| } |
| let (removed, isEmpty) = observers.withLock { | ||
| ($0.removeValue(forKey: token), $0.isEmpty) | ||
| } | ||
| _ = removed |
There was a problem hiding this comment.
| let (removed, isEmpty) = observers.withLock { | |
| ($0.removeValue(forKey: token), $0.isEmpty) | |
| } | |
| _ = removed | |
| let (_, isEmpty) = observers.withLock { | |
| ($0.removeValue(forKey: token), $0.isEmpty) | |
| } |
| observers.values.forEach { observer in | ||
| observer(.init(duration: duration, isOngoing: false)) | ||
| } |
There was a problem hiding this comment.
Should we be invoking these callbacks under the lock?
| var semaphoreSuccess = false | ||
| while !semaphoreSuccess { | ||
| let timeout = DispatchTime.now() + DispatchTimeInterval.milliseconds(Int(pollingInterval * 1_000)) | ||
| let result = semaphore.wait(timeout: timeout) |
There was a problem hiding this comment.
IIUC this logic didn't really change with this PR, but I wonder if we can come up with a better approach than blocking an entire thread with a semaphore. E.g. have we considered something in DispatchTimerSource API? I'm not personally super familiar with it, but essentially it seems like we just want something to notify us on a non-main thread after a certain amount of time has elapsed, which I think that API is designed to do. Seems a waste to burn a thread just blocking like this if there's some way to avoid it.

Should be merged after #8214 to adopt the new mutex type for observers.
Summary
Splits the existing
DefaultHangTrackerinto a three-tier architecture that separates runloop polling from hang classification. This prepares the foundation for upcoming app hang V3 work (stacktrace sampling, flamegraph construction) while cleaning up the integration layer. Also adds adevelop-docs/TERMINOLOGY.mdglossary defining Apple and SDK terms for hang detection.Motivation
The existing
DefaultHangTrackerdoes two things at once: it polls the main runloop at around 25 ms intervals via aCFRunLoopObserver+DispatchSemaphorepattern, and it broadcasts raw delay callbacks to all observers. Consumers likeWatchdogTerminationTrackingIntegrationthen have to manually filter by their own duration threshold (guard interval > timeoutInterval), which scatters hang-classification logic across consumers.The upcoming app hang V3 integration needs to sample stacktraces at the high-frequency 25 ms polling rate during a hang, then emit accumulated data only when the hang ends. This requires a middle layer that subscribes to the raw polling and handles debouncing, thresholding, and in the future stacktrace accumulation. None of that belongs in the low-level runloop observer or the integration itself.
Architecture
Three layers, each with a single responsibility:
RunLoopDelayTracker(renamed fromDefaultHangTracker). The low-level runloop observer. Polls at around 25 ms (1.5 times the expected frame duration). Notifies observers on every tick while the runloop is blocked (ongoing=true), then once when it unblocks (ongoing=false). No threshold logic, no hang classification.AppHangTracker(new middle layer). Wraps aRunLoopDelayTrackerand adds per-observer thresholds. Each observer registers withaddObserver(threshold:handler:). The tracker notifies an observer exactly once when the accumulated delay first exceeds that observer's threshold, then once when the hang ends. UseshasBeenNotifiedstate per observer to prevent duplicate notifications within a single hang. Designed to later support stacktrace sampling on every raw 25 ms tick (out of scope for this PR).Integrations (consumers). For example,
WatchdogTerminationTrackingIntegrationnow callsappHangTracker.addObserver(threshold: timeoutInterval)instead of manually guardinginterval > timeoutInterval. The threshold filtering is theAppHangTracker's job.Changes
Sources/Swift/RunLoopDelayTracker.swiftrenamed fromHangTracker.swift. Types:DefaultHangTrackertoDefaultRunLoopDelayTracker, protocolHangTrackertoRunLoopDelayTracker. Behavior unchanged, only names and doc comments updated.Sources/Swift/AppHangTracker.swiftis a new file.DefaultAppHangTrackerwrapsRunLoopDelayTracker, manages per-observerObserverEntrystructs with threshold andhasBeenNotifiedstate.AppHangTrackerProviderprotocol for dependency injection.Sources/Swift/SentryDependencyContainer.swiftnow has anappHangTrackerlazy var that creates aDefaultRunLoopDelayTrackerwrapped in aDefaultAppHangTracker.SentryWatchdogTerminationTrackingIntegration.swiftreplacedaddOngoingHangObserverand manual threshold guard withappHangTracker.addObserver(threshold:handler:).HangTrackerTests.swiftrenamed toRunLoopDelayTrackerTests.swiftwith renamed types. NewAppHangTrackerTests.swiftwith 9 tests for the middle layer covering threshold filtering, deduplication, consecutive hangs, and multiple observers with different thresholds.MockAppHangTrackerin watchdog tests updated to mirror threshold logic.develop-docs/TERMINOLOGY.mdnew glossary defining Apple terms (hang, hitch, watchdog termination) and SDK terms (runloop delay, polling interval, app hang, ANR) with references to Apple documentation.Test plan
RunLoopDelayTrackerTestspass (9 tests, renamed, behavior unchanged)AppHangTrackerTestspass (9 new tests for threshold edge cases, dedup, consecutive hangs)SentryWatchdogTerminationIntegrationTestspass (17 tests including 3 threshold tests)make build-iospassesmake formatclean#skip-changelog