Skip to content

fix: preserve crash report ID high bits#8216

Merged
supervacuus merged 1 commit into
mainfrom
supervacuus/fix/report_id_gen_cast
Jun 26, 2026
Merged

fix: preserve crash report ID high bits#8216
supervacuus merged 1 commit into
mainfrom
supervacuus/fix/report_id_gen_cast

Conversation

@supervacuus

@supervacuus supervacuus commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

📜 Description

This is a fun one. It essentially adds one cast to preserve the high bits in our report ID generation. This is required not only to avoid cycling IDs too frequently, but also to avoid accidentally creating 0-IDs, which we then discard. I pulled out the inner part that converts to ID from the time_t we get from gmtime_r() and added a regression test that shows the issue.

💡 Motivation and Context

This problem appeared recently when another PR highlighted flaky tests in an unrelated part of the code. However, I also recently investigated that part of the code in the context of the KSCrash investigation and thus had somewhat of a clue what was happening there.

Screenshot 2026-06-24 at 14 54 47

In particular, what popped into mind when looking at the logs was the fact that tests and retries all happened within the same second, which to me hinted at a connection to the report ID construction, which itself is based on a seconds counter, meaning tests running within the same second will all use the same ID base.

And it turned out I was right, but in a rather roundabout way. The three relevant parts in the code are here:

static void
initializeIDs(void)
{
time_t rawTime;
time(&rawTime);
struct tm time;
gmtime_r(&rawTime, &time);
int64_t baseID = (int64_t)time.tm_sec + (int64_t)time.tm_min * 61
+ (int64_t)time.tm_hour * 61 * 60 + (int64_t)time.tm_yday * 61 * 60 * 24
+ (int64_t)time.tm_year * 61 * 60 * 24 * 366;
baseID <<= 23;
g_nextUniqueIDHigh = baseID & ~0xffffffff;
uint32_t lowerBaseID = (uint32_t)(baseID & 0xffffffff);
g_nextUniqueIDLow = lowerBaseID;
}

while ((ent = readdir(dir)) != NULL && index < count) {
int64_t reportID = getReportIDFromFilename(ent->d_name);
if (reportID > 0) {
reportIDs[index++] = reportID;
}
}

while ((ent = readdir(dir)) != NULL) {
if (ent->d_type != DT_DIR && getReportIDFromFilename(ent->d_name) > 0) {
count++;
}
}

The first creates a report ID based on the current time. The last two are iterations from getReportIDs() and getReportCount(). The latter two establish the first important fact: the SDK ignores reports with ID == 0. Regarding the first one, we have to look at the timestamp of the failing flaky tests: 2026-06-23 22:01:27.

If we plug the timestamp into the formula, we get

baseID = 27
         + 1 * 61
         + 22 * 61 * 60
         + 173 * 61 * 60 * 24
         + 126 * 61 * 60 * 24 * 366

which produces 4066106368 or 0xf25bdc00. After left-shifting by 23 bits, we get 0x00792dee00000000. This means our entire lower 32 bits are 0.

Now we come to the bug:

     g_nextUniqueIDHigh = baseID & ~0xffffffff; 

The intention here is that it creates a negative masking for the lower bits, leaving us with the upper 32 bits (in their final 64-bit position). But the problem is that the mask 0xffffffff is a 32-bit value in C, and inverting the bits yields 0x00000000, which, in 64-bit, is also 0, i.e., a mask of 0x0000000000000000, not the intended 64-bit mask 0xffffffff00000000.

Since the lower 32 bits are 0 anyway, and we have now masked away the non-zero upper 32 bits, the resulting ID equals 0. Meaning our tests checking on counts will fail to count the first report generated. But also: any report generated in production during one such unlucky second will be ignored.

How often does this happen? Since we shift the seconds by 23 bits, we only have 9 bits left in our second-counter precision. Meaning, around every 512 seconds (ignoring the formula details), we cycle the IDs. Whenever the baseID modulo 512 == 0 before shifting, the resulting ID equals zero. Of course, this also means we cycle the ID generation every 512 seconds, but I guess this is less of a problem since we generate crash reports here.

The fix is to cast 0xffffffff to a 64-bit value, so that inverting the bits results in 0xffffffff00000000, i.e., the expected mask. After my dive into this, I also looked upstream (in the old version, develop changed report generation considerably), and as you can expect, the bug was fixed. What is rather surprising and painful is that it was fixed back in 2020.

💚 How did you test it?

Added a unit test that fails with the old implementation and correctly handles the new one.

📝 Checklist

You have to check all boxes before merging:

  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.
  • If I added a new public API, I also added it to the SentryObjC wrapper.

Closes #8217

@supervacuus supervacuus added the ready-to-merge Use this label to trigger all PR workflows label Jun 25, 2026
@sentry

sentry Bot commented Jun 25, 2026

Copy link
Copy Markdown

📲 Install Builds

iOS

🔗 App Name App ID Version Configuration
SDK-Size io.sentry.sample.SDK-Size 9.19.0 (1) Release

⚙️ sentry-cocoa Build Distribution Settings

@github-actions

Copy link
Copy Markdown
Contributor

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 1228.35 ms 1270.57 ms 42.23 ms
Size 24.14 KiB 1.22 MiB 1.20 MiB

Baseline results on branch: main

Startup times

Revision Plain With Sentry Diff
ed33915 1231.46 ms 1259.48 ms 28.02 ms
e337271 1234.71 ms 1270.10 ms 35.40 ms
1936da3 1223.18 ms 1253.96 ms 30.78 ms
4014441 1232.16 ms 1266.41 ms 34.25 ms
dff20e3 1225.19 ms 1258.66 ms 33.47 ms
bd2b6d3 1213.78 ms 1253.83 ms 40.06 ms
ce33dcf 1236.42 ms 1267.76 ms 31.34 ms
8bb6aeb 1215.82 ms 1244.84 ms 29.02 ms
27850be 1222.65 ms 1254.23 ms 31.58 ms
1af7edf 1210.63 ms 1247.91 ms 37.29 ms

App size

Revision Plain With Sentry Diff
ed33915 24.14 KiB 1.20 MiB 1.17 MiB
e337271 24.14 KiB 1.16 MiB 1.13 MiB
1936da3 24.14 KiB 1.17 MiB 1.15 MiB
4014441 24.14 KiB 1.16 MiB 1.14 MiB
dff20e3 24.14 KiB 1.15 MiB 1.13 MiB
bd2b6d3 24.14 KiB 1.17 MiB 1.14 MiB
ce33dcf 24.14 KiB 1.19 MiB 1.16 MiB
8bb6aeb 24.14 KiB 1.17 MiB 1.15 MiB
27850be 24.14 KiB 1.18 MiB 1.16 MiB
1af7edf 24.14 KiB 1.19 MiB 1.16 MiB

@philprime philprime left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for fixing this and the flaky tests!

@supervacuus supervacuus merged commit 3b55bda into main Jun 26, 2026
428 of 452 checks passed
@supervacuus supervacuus deleted the supervacuus/fix/report_id_gen_cast branch June 26, 2026 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved ready-to-merge Use this label to trigger all PR workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: preserve crash report ID high bits

3 participants