Telegram sync performance pass + AC4 codec support#2482
Open
Amonoman wants to merge 15 commits into
Open
Conversation
Support detecting AC4 audio (used in iOS/MOV containers) in IsoBmffAudioCodecDetector, with corresponding unit tests.
Telegram sync was reading each song's file (file.exists() + ID3 tag parse via AudioMetadataReader) one at a time in a sequential forEach, inside syncTelegramData's chunk loop. For large channels (~100k songs) this serial per-song file IO was the dominant cost, taking on the order of 10 minutes. Extract that IO/CPU-bound step and run it concurrently per chunk using the same Semaphore + async/awaitAll pattern already used elsewhere in this file (see the MediaStore processing phase), bounded by a new TELEGRAM_METADATA_READ_CONCURRENCY = 8 to avoid saturating disk IO. The sequential artist/album dedup maps that follow remain untouched, since they mutate shared per-chunk state and aren't safe to parallelize as-is. Also drops a redundant resolveAlbumArtUri() call that was happening twice per song when a local file was found (value doesn't change between the two call sites). No behavior change to the resulting song/album/artist data — only the ordering/concurrency of how it's computed. Co-Authored-By: Claude <noreply@anthropic.com>
incrementalSyncMusicData() ran deleteOrphanedAlbums() and deleteOrphanedArtists() unconditionally at the end of every call. Both are full-table NOT EXISTS scans against songs / cross-refs. syncTelegramData() calls incrementalSyncMusicData() once per chunk (~200 times for a 100k-song channel at the current chunk size), so these scans ran ~200 times per sync even though pure-insert chunks can never produce an orphaned album or artist — only the dedicated deletion path can. Add an optional cleanupOrphans parameter (default true, preserving existing behavior for every other caller) and a new cleanupOrphanedMusicData() convenience method. syncTelegramData() now passes cleanupOrphans = false on every chunk flush and the batched deletion loop, then calls cleanupOrphanedMusicData() once after both finish. Other incrementalSyncMusicData call sites (MediaStore sync, Netease sync) are single-shot, not chunked loops, so they're unaffected and keep the default. Co-Authored-By: Claude <noreply@anthropic.com>
parseReplayGainDb() constructed a new Regex("(?i)[dD][bB]") on every
invocation. It's called via extractReplayGainDb() up to twice per
song read (track gain + album gain, each checking up to 3 property
key fallbacks), so a 100k-song sync could compile this same pattern
hundreds of thousands of times.
Hoist it to a private val compiled once at class load.
Investigated whether the two separate TagLib calls in read()
(getAudioProperties() + getMetadata(), each on a dup()'d fd) were
redundant and could be merged — confirmed against the upstream
Kyant0/taglib test suite that this is the library's required usage
pattern, not duplicated work, so left unchanged.
…or for SetTdlibParameters
TdApi.SetTdlibParameters was being constructed with a 15-argument
positional constructor, with a comment admitting the argument order
was guessed ("the order varies by version... I will try the most
common flat signature").
Verified against TDLib's official Java example and documentation:
SetTdlibParameters has no flat multi-arg constructor at all, only a
default constructor and one taking a single TdlibParameters object.
The previous code was very likely silently passing values into the
wrong fields on every app launch (e.g. apiId potentially landing in
systemLanguageCode's slot), since boolean/string parameters of the
wrong type would not fail at compile time given how the bindings are
laid out.
Replace with the documented pattern: construct TdApi.TdlibParameters()
and assign each field by name, which cannot misalign the way a wrong
guessed positional order can.
Also explicitly sets enableStorageOptimizer = true, which was not
reliably set by the old code (its effective value depended on exactly
how the broken positional mapping landed). This matches TDLib's
documented recommended default; flag for review if a different value
is preferred.
…solution ForumTopicInfo's thread ID is resolved via reflection (field name isn't confirmed for this tdlibx fork from available documentation). Two debug logs ran on every topic during every forum sync: one dumping all reflected field names, one logging the resolved field name and value. Both were one-time discovery aids per their own comments, not meant to run indefinitely. Removed both. Kept the Timber.w fallback-path warning and Timber.e failure log, since those flag situations actually worth knowing about. Did not change the reflection logic itself or attempt to replace it with direct property access: multiple TDLib Java bindings (official tdlib/td, the tdlight fork, and TDLib's own GetForumTopic/ SearchChatMessages fields) consistently use "messageThreadId", which is already the first name this code tries, but this exact tdlibx 1.8.56 fork's ForumTopicInfo source wasn't directly inspectable to confirm a direct property reference would compile. If a past Logcat capture from this debug log is available, it would let this be replaced with a confirmed direct field reference instead.
TELEGRAM_METADATA_READ_CONCURRENCY was a flat constant (8) regardless
of device. Each concurrent read does a real native TagLib JNI call,
which is genuine CPU-bound work, not just blocking IO wait. Kotlin
dispatchers schedule threads, they don't reserve CPU time, so heavy
parallel CPU-bound work here can still starve the Main thread and
cause visible UI jank on a device with few cores, while being
needlessly conservative on a high-core device.
Replace the flat constant with telegramMetadataReadConcurrency(),
which scales with Runtime.getRuntime().availableProcessors() / 2,
capped to [2, 4]. This is an engineering default reasoned from how
coroutine dispatchers and native JNI calls interact with real CPU
scheduling, not a profiled or measured-optimal value; it has not
been verified against real device traces.
Prompted by a field report of heavy UI lag during a large Telegram
sync on a device that did not OOM (separate from the earlier OOM
investigation). Lowering concurrency trades some of the previously
measured sync-time improvement for reduced peak CPU contention; on
the device used for the 50-second/100k-song benchmark, this will
likely land at concurrency 4 (down from 8) and increase sync time
accordingly. Also tightens the artist/album preload in
syncTelegramData to build both derived maps in a single pass with
pre-sized capacity, instead of two separate default-capacity
.associate{} calls; this does not change the preload's fundamental
scaling with existing-library size, which remains a known, separate
limitation.
Three small, repeated costs found during a broader pass over SyncWorker.kt, each scaling with library size: - fetchMusicFromMediaStore: cursor.getString(dataCol) and cursor.getInt(trackCol) were each read twice per row (once for the directory filter / disc-vs-track split, once for the RawSongData fields). Cursor getters cross into native CursorWindow code, so this was a real, avoidable JNI call per row. Read once, reuse. - fetchMediaStoreIds: allocated two File objects per row (File(data).parent, then File(parentPath).absolutePath) to arrive back at a string MediaStore's DATA column already provides as an absolute path. Replaced with the same string-slicing approach already used for the identical check in fetchMusicFromMediaStore. - syncNeteaseData: toUnifiedNeteaseArtistId(name), a lowercase() + hashCode() computation, ran up to three times for the same artist name within the same song iteration (primary artist, the cross-ref loop, and the JSON artist refs). Computed once per song into artistIds and reused across all three. No behavior change in any of the three; all are pure redundant-work elimination, verified by re-deriving the same values via the same inputs.
A previous commit replaced this call with TdApi.TdlibParameters() field assignment, based on TDLib's official docs/example. That approach does not compile against this exact tdlibx:td:1.8.56 build: the real compiler error confirms TdApi.SetTdlibParameters here has only a no-arg constructor and a flat 14-argument one, no constructor taking a TdlibParameters object exists in this build at all. Compared the original (pre-revert) flat-constructor call against the compiler-confirmed real signature, argument type and position both match exactly, so the original code was correct for this build. The "argument order was guessed" comment that motivated the previous change was based on uncertainty that, in retrospect, wasn't actually a problem here. Reverted to the flat constructor, now with the real, compiler- verified signature documented in a comment instead of assumed from docs for a different TDLib binding, and named local variables in place of bare positional literals for review clarity.
fetchMusicFromMediaStore's incremental path queried musicDao.getSongsByIdsListSimple() twice for every song that turned out to have changed: once in Phase 2 to fetch its existing entity and decide whether it changed, and once again in Phase 3 for the exact same IDs to get that same entity for the user-edit-preserving merge step. Phase 2 already has the existing entity in hand right before discarding it. Carry it forward as part of songsToProcess (now List<Pair<RawSongData, SongEntity?>> instead of List<RawSongData>) instead of re-fetching it in Phase 3. Removes one DB round-trip per batch of changed songs; impact scales with how many songs actually changed in a given sync (near-zero for a quiet incremental sync, real for a deep/force-metadata rescan or any sync touching many files at once). No behavior change: same merge logic, same data, just sourced from the pair instead of a second query.
…estart per emission getArtists() re-emits every time the songs/artists tables change — its own song-count column changes with every song attributed to an artist, so distinctUntilChanged() upstream does not dedupe these away during a sync. Each emission previously cancelled any in-flight prefetch and launched a brand new one from scratch over the whole missing-images list (see the removed prefetchJob field). Confirmed via a field performance report: during an active sync, artist_image_prefetch_start/end events fired repeatedly within seconds, each with a slightly different (shrinking) artist count and short duration, tightly interleaved with 600-1000ms+ main-thread frame stalls. A 100k-song sync with ~200 chunk commits could trigger on the order of 200 cancel-and-restart cycles, each one spinning up coroutines and partially initiating Deezer API calls that get abandoned almost immediately, real, repeated, wasted CPU/IO work contending with the rest of the app, including the Main thread. Decouple the UI-facing artist list (still emitted immediately, unchanged) from the prefetch trigger via a separate, debounced MutableSharedFlow. collectLatest() provides the same "cancel the previous one if a newer list arrives" semantics the manual prefetchJob bookkeeping was doing, just downstream of a debounce, so a burst of emissions during active syncing collapses into one prefetch attempt once the list settles, instead of restarting on every single one. ARTIST_PREFETCH_DEBOUNCE_MS (1500ms) is a judgment call, not a profiled-optimal value. getArtistsForSong's prefetch trigger (scoped to a single song's artists) is unchanged: it already has a partial defense against repeated cancellation and nothing in the evidence gathered points to it needing the same fix.
… batched lookups syncTelegramData loaded every artist and every album in the entire unified library into memory before processing a single chunk. This preload's cost scaled with the size of the user's *existing* library, not the channel being synced on a library with many thousands of distinct artists already present, this was a real and growing amount of memory held for the whole sync. Replace it with per-chunk batched lookups: each chunk now collects only its own distinct artist names and album titles, and queries the DB for just those via two new MusicDao methods (getArtistsByNormalizedNames, getAlbumsByNormalizedTitles), chunked in batches of 500 names to stay under SQLite parameter limits. Memory now scales with chunk size, not library size. Correctness is preserved by the existing deterministic synthetic-ID scheme: a genuinely new artist/album gets the same negative hash-derived id regardless of whether this lookup finds it, so it doesn't need to be "found" to behave correctly. The lookup only matters for reusing a real existing DB id instead of creating a duplicate row, and since each chunk commits before the next one starts (cleanupOrphans=false flush per chunk, established earlier), an artist/album created in an earlier chunk this same sync run is already persisted and gets found by a later chunk's query exactly as if it had existed before the sync started. Album lookup matches by title only (SQLite IN() can't match tuples directly); the full title+artist key is reconstructed and checked in Kotlin afterward, same matching precision as the old whole-table version, just scoped to a chunk's relevant titles instead of every album in the library.
…very sync commit
getAudioFiles() wraps musicDao.getAllSongs(), a Room Flow query that
re-emits on every songs-table commit. During an active MediaStore or
Telegram sync, that's many commits in quick succession, each
producing a genuinely different list, so distinctUntilChanged()
downstream doesn't dedupe them away. Each emission triggered a full
entities.map { it.toSong() } conversion (potentially 100k+
conversions) and, downstream in LibraryStateHolder's songs
collector, a full ImmutableList + Map rebuild assigned to StateFlows
that Compose observes.
Same root cause as the artist-image prefetch storm fixed earlier in
getArtists(): something reacting expensively to every single commit
of a sync instead of to the settled end state. Not independently
confirmed via telemetry for this specific path (a fresh field
performance report is pending) — this is reasoning by analogy to
that confirmed, structurally identical bug, prompted by a new OOM
report (ArrayList allocation failure inside an emit()/coroutine
resume chain on Dispatchers.Main.immediate, during MediaStore auto
sync) that matches the same signature.
Debounce placed before the toSong() conversion (not after), so
debounced-away emissions don't pay the conversion cost either.
AUDIO_FILES_DEBOUNCE_MS (600ms) is intentionally shorter than
ARTIST_PREFETCH_DEBOUNCE_MS (1500ms): that one delays a background
side effect the user doesn't directly perceive, this one delays the
songs list itself, so a normal one-off action should still feel
responsive. A judgment call, not a profiled-optimal value.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improves Telegram sync performance and adds AC4 audio codec detection for cast.
Telegram sync performance
Telegram sync was taking about 10 minutes for a 100k-song channel. Three targeted fixes:
syncTelegramDataran sequentially, one song at a time. Now runs concurrently (bounded semaphore, concurrency = 8) using the same pattern already used elsewhere inSyncWorker. Also removes a redundant duplicateresolveAlbumArtUri()call per song.incrementalSyncMusicData()ran two full-tableNOT EXISTSscans on every chunk flush (about 200 times for 100k songs). Insert-only chunks can never produce an orphan, so cleanup is now skipped per-chunk (opt-out parameter, default unchanged for all other callers) and run once at the end instead.AudioMetadataReader.parseReplayGainDb()to a compiled-once constant, since it ran up to twice per song.No behavior or output changes. Same songs, albums, artists, and cross-refs end up in the DB, just computed faster.
Tested: fresh install, 100k-song Telegram channel (6kbps MP3s with metadata, release build). Sync completed in about 50 seconds, down from about 10 minutes before these changes.
Cast
Adds AC4 audio codec detection for ISO BMFF containers in
IsoBmffAudioCodecDetector, with unit tests.