Skip to content

Fix stretched/misplaced logo on startup & race conditions#1286

Open
GeneralProtectionFault wants to merge 3 commits into
Redot-Engine:masterfrom
GeneralProtectionFault:splash_wayland
Open

Fix stretched/misplaced logo on startup & race conditions#1286
GeneralProtectionFault wants to merge 3 commits into
Redot-Engine:masterfrom
GeneralProtectionFault:splash_wayland

Conversation

@GeneralProtectionFault

@GeneralProtectionFault GeneralProtectionFault commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Fixes #1228

This refactors some of the startup/wayland/x11 code to fix the logo being scaled incorrectly with the startup window, or misplaced (Hyperland), both in compatibility mode and otherwise. In a nutshell, the cause is race conditions, one of the nastier ones being in show_window() (display_server_wayland.cpp), which calls window_create()--which stores the resolution in wayland_thread, but not in display_server_wayland (not until process_events() fires on the main thread).
So, the fix gets the window size right then and there.

I'm creating it as a draft because I think it warrants some review, and working on creating a diagram to illustrate the sequence of function calls. @JoltedJon has tested on Hyperland native, myself on a Hyperland VM (in that case, there were some unresolved issues with X11 as well), as well as Arch linux (Artix).

I will test on a Windows VM as well, but hopefully this only touches Linux given it's heavily on the Linux compositors/protocols.

Summary by CodeRabbit

  • New Features
    • Improved window/compositor synchronization during startup for better behavior on Wayland and X11.
  • Bug Fixes
    • Fixed boot image and logo sizing so they render correctly across different display scales.
    • Reduced startup and shutdown hangs by adding timeouts and safer thread cleanup.
    • Improved editor preview, tile thumbnail, and documentation task shutdown to avoid waiting indefinitely.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR adds timeout-bounded synchronization to numerous thread shutdown paths (WorkerThreadPool, EditorHelp doc caches, EditorFileSystem scanning, EditorResourcePreview, tile pattern preview thread) via a new ConditionVariable::wait_for_usec API, and introduces a cross-platform DisplayServer::compositor_sync() hook with Wayland/X11 implementations plus boot-image rectangle scaling fixes to resolve editor splash stretching under Wayland.

Changes

Timeout-bounded thread shutdown synchronization

Layer / File(s) Summary
Timed condition-variable wait API
core/os/condition_variable.h
Adds wait_for_usec overloads for both threaded (std::chrono-based) and non-threaded stub ConditionVariable implementations.
WorkerThreadPool shutdown timeout
core/object/worker_thread_pool.cpp
exit_languages_threads() uses wait_for_usec with a fixed per-phase timeout, warning and breaking instead of waiting indefinitely.
EditorHelp doc/script-cache thread cleanup
editor/doc/editor_help.cpp, editor/doc/editor_help.h
Adds _worker_thread_done, _loader_thread_done, _cleanup_in_progress flags; worker threads signal completion, generation/regen/load paths skip work during cleanup, and cleanup_doc() flushes the message queue while waiting on these flags.
EditorFileSystem scan thread timeout
editor/file_system/editor_file_system.cpp
Exit-tree shutdown waits at most 3 seconds for scanning, logging and skipping wait_to_finish() on timeout.
EditorResourcePreview stop() timeout
editor/inspector/editor_resource_preview.cpp
Adds a 3-second timeout to the shutdown polling loop, warning and breaking on expiry.
TilesEditorUtils worker thread shutdown
editor/scene/2d/tiles/tiles_editor_plugin.cpp, editor/scene/2d/tiles/tiles_editor_plugin.h
Render loop gains shutdown-aware cleanup around frame_pre_draw; destructor unblocks the frame wait and applies a bounded timed wait with periodic sync/flush.
Main::cleanup shutdown ordering
main/main.cpp
Exits language worker threads before deleting the main loop/SceneTree and reorders ScriptServer::finish_languages()/RenderingServer::sync() calls.

Estimated code review effort: 4 (Complex) | ~60 minutes

Compositor sync and boot-logo rectangle scaling

Layer / File(s) Summary
DisplayServer compositor_sync interface
servers/display_server.h, platform/linuxbsd/wayland/display_server_wayland.h, platform/linuxbsd/x11/display_server_x11.h
Adds a virtual compositor_sync() with a default no-op and override declarations for Wayland and X11.
Wayland compositor_sync implementation
platform/linuxbsd/wayland/display_server_wayland.cpp
show_window re-syncs WindowData::rect.size to the compositor-reported physical size before surface creation; compositor_sync() waits up to 200ms for a configure event and drains it.
WaylandThread roundtrip/window-rect wait helpers
platform/linuxbsd/wayland/wayland_thread.cpp, platform/linuxbsd/wayland/wayland_thread.h
Adds initial_configure_done state, roundtrip(), a bounded configure-wait loop in window_create, wait_window_rect_ms(), adjusted timeout bookkeeping in wait_frame_suspend_ms, and Doxygen-style comment reformatting throughout.
X11 compositor_sync and ConfigureNotify polling
platform/linuxbsd/x11/display_server_x11.cpp
show_window adds a fallback ConfigureNotify polling loop; new compositor_sync() polls the X11 fd up to 200ms for a size change on tiling window managers.
Main::setup2 boot logo re-sync
main/main.cpp
Calls compositor_sync() after the initial boot logo draw and redraws it if the window size changed.
Boot image rectangle scaling fix
drivers/gles3/rasterizer_gles3.cpp, servers/rendering/renderer_rd/renderer_compositor_rd.cpp
set_boot_image uses calculate_boot_screen_rect or scales by screen_get_scale() instead of assigning the raw image rectangle, preventing splash stretching.

Estimated code review effort: 4 (Complex) | ~70 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Main
  participant DisplayServer
  participant CompositorSync
  participant SetupBootLogo
  Main->>SetupBootLogo: setup_boot_logo() (initial)
  Main->>DisplayServer: compositor_sync()
  DisplayServer->>CompositorSync: wait up to 200ms for configure
  CompositorSync-->>DisplayServer: process_events() / poll result
  DisplayServer-->>Main: window size (possibly updated)
  Main->>Main: compare pre/post-sync size
  alt size changed
    Main->>SetupBootLogo: setup_boot_logo() (re-draw)
  end
Loading
sequenceDiagram
  participant Caller
  participant WorkerThread
  participant CompletionFlag
  participant MessageQueue
  Caller->>WorkerThread: start work, reset CompletionFlag
  WorkerThread->>WorkerThread: perform doc/scan/preview work
  WorkerThread->>CompletionFlag: set() on finish
  Caller->>MessageQueue: flush() while polling CompletionFlag
  CompletionFlag-->>Caller: done or 3s timeout
  Caller->>WorkerThread: wait_to_finish() (if not timed out)
Loading

Estimated code review effort: 4 (Complex) | ~90 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes several unrelated editor shutdown and timeout changes that do not appear necessary for the Wayland splash fix. Split out the editor cleanup, thread-timeout, and shutdown synchronization changes into a separate PR focused on the splash/logo fix.
Docstring Coverage ⚠️ Warning Docstring coverage is 11.54% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title matches the main change: fixing startup splash/logo placement and related race conditions.
Linked Issues check ✅ Passed The Wayland splash-stretch issue is addressed by boot-logo sizing and compositor/window-size synchronization changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@GeneralProtectionFault GeneralProtectionFault force-pushed the splash_wayland branch 2 times, most recently from 52acf13 to c85fba8 Compare June 28, 2026 00:26
@JoltedJon

Copy link
Copy Markdown
Contributor

@GeneralProtectionFault What is the status of this PR, can we open it for review now?

@GeneralProtectionFault

GeneralProtectionFault commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

@GeneralProtectionFault What is the status of this PR, can we open it for review now?

Yeah I'll ready it up. We know it works, I just felt a little uneasy about knowing if this touches somewhere I'm not aware of, etc...

But, no harm I guess, let's see what the High Hare has to say about it.

Ah....lemme git it synced up too.

@GeneralProtectionFault GeneralProtectionFault marked this pull request as ready for review July 1, 2026 03:16

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
editor/doc/editor_help.cpp (1)

3162-3188: 🩺 Stability & Availability | 🔴 Critical | ⚡ Quick win

Signal _worker_thread_done on every thread exit path.

_worker_thread_done is reset before this thread starts, but the error/corrupt-cache returns exit before Line 3188. cleanup_doc() can then wait forever on Line 3421.

Proposed fix
 void EditorHelp::_load_script_doc_cache_thread(void *p_udata) {
-	ERR_FAIL_COND_MSG(!ProjectSettings::get_singleton()->is_project_loaded(), "Error: cannot load script doc cache without a project.");
-	ERR_FAIL_COND_MSG(!ResourceLoader::exists(get_script_doc_cache_full_path()), "Error: cannot load script doc cache from inexistent file.");
+	if (!ProjectSettings::get_singleton()->is_project_loaded()) {
+		_worker_thread_done.set();
+		ERR_FAIL_MSG("Error: cannot load script doc cache without a project.");
+	}
+	if (!ResourceLoader::exists(get_script_doc_cache_full_path())) {
+		_worker_thread_done.set();
+		ERR_FAIL_MSG("Error: cannot load script doc cache from inexistent file.");
+	}
 
 	Ref<Resource> script_doc_cache_res = ResourceLoader::load(get_script_doc_cache_full_path(), "", ResourceFormatLoader::CACHE_MODE_IGNORE);
 	if (script_doc_cache_res.is_null()) {
 		print_verbose("Script doc cache is corrupted. Regenerating it instead.");
 		_delete_script_doc_cache();
 		callable_mp_static(EditorHelp::regenerate_script_doc_cache).call_deferred();
+		_worker_thread_done.set();
 		return;
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/doc/editor_help.cpp` around lines 3162 - 3188,
EditorHelp::_load_script_doc_cache_thread currently returns early on the failed
load/corrupt-cache path without signaling _worker_thread_done, which can leave
cleanup_doc() waiting indefinitely. Update this function so every exit path,
including the null ResourceLoader::load case and any other failure returns, sets
_worker_thread_done before returning; keep the normal success path signaling as
well.
editor/inspector/editor_resource_preview.cpp (1)

606-616: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make the timeout affect the join path too.

After breaking on timeout, Line 616 still blocks indefinitely in wait_to_finish(), so this doesn’t actually bound shutdown. If skipping the join is unsafe, keep waiting and treat this as a warning threshold instead.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/inspector/editor_resource_preview.cpp` around lines 606 - 616, The
timeout in EditorResourcePreview’s preview-thread loop only stops the polling
loop, but thread.wait_to_finish() still blocks forever afterward. Update the
shutdown flow so the same TIMEOUT_USEC limit also applies to the join path:
either avoid calling wait_to_finish() after timeout, or keep waiting while
treating the timeout as a warning threshold rather than a hard exit. Use the
existing exited flag, TIMEOUT_USEC check, and thread.wait_to_finish() in
EditorResourcePreview to make the behavior consistent.
🧹 Nitpick comments (2)
drivers/gles3/rasterizer_gles3.cpp (1)

491-502: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Duplicate boot-rect scaling logic across renderers.

This same scale/center computation (screen_get_scale multiply + floor-centered offset) is duplicated verbatim in RendererCompositorRD::set_boot_image (servers/rendering/renderer_rd/renderer_compositor_rd.cpp, lines 227-234). Consider factoring this into a shared helper (e.g., alongside OS::calculate_boot_screen_rect) to avoid the two implementations drifting apart in future fixes.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@drivers/gles3/rasterizer_gles3.cpp` around lines 491 - 502, The boot-image
screen rect scaling and centering logic in rasterizer_gles3.cpp is duplicated in
RendererCompositorRD::set_boot_image, so update this path to use a shared helper
instead of maintaining two copies. Factor the screen_get_scale multiplication
and floor-centered offset computation into a common function near
OS::calculate_boot_screen_rect, then have both RasterizerGLES3 and
RendererCompositorRD call it so the behavior stays consistent.
platform/linuxbsd/x11/display_server_x11.cpp (1)

1942-1955: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Sleeping up to 100ms while holding events_mutex.

The fallback polling loop calls OS::get_singleton()->delay_usec(5000) up to 20 times inside the MutexLock mutex_lock(events_mutex); scope opened at line 1928. This can block the dedicated _poll_events_thread from touching polled_events for up to 100ms during startup. Bounded and one-shot (MAIN_WINDOW_ID only), so impact is limited, but narrowing the lock to just the XCheckTypedWindowEvent check (re-acquiring it each iteration) would avoid holding it across the sleep.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@platform/linuxbsd/x11/display_server_x11.cpp` around lines 1942 - 1955, The
fallback polling in display_server_x11.cpp holds events_mutex while calling
OS::get_singleton()->delay_usec(5000), which can block _poll_events_thread from
accessing polled_events during startup. Update the logic around the
ConfigureNotify wait in the X11 window setup path so the mutex is only held for
the XCheckTypedWindowEvent/_window_changed access, and is released before any
sleep/delay in the loop. Keep the change localized to the startup polling block
that uses get_config_event, wd.maximized, wd.fullscreen, and MAIN_WINDOW_ID
behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@core/object/worker_thread_pool.cpp`:
- Around line 861-884: The shutdown wait loops in WorkerThreadPool’s cleanup
path currently break on timeout and then continue teardown, which can leave
worker threads running against stale SceneTree/script state. Update the timeout
handling in the pre-exit and exit-language wait sections so a timeout does not
fall through to normal shutdown; instead, fail loudly and stop the remaining
shutdown sequence, or otherwise escalate in a way that prevents Main::cleanup()
from deleting scene/script resources while threads may still be active. Use the
existing _switch_runlevel, control_cond_var.wait_for_usec, and WARN_PRINT flow
to locate and adjust the shutdown logic.

In `@editor/file_system/editor_file_system.cpp`:
- Around line 1736-1748: The scan timeout path in
EditorFileSystem::reconnect_scan_thread currently skips waiting for the worker
and can free filesystem state while the scan thread is still active. Update the
timeout handling around active_thread/scanning so any shutdown or cleanup that
touches filesystem and new_filesystem only happens after the scan thread has
fully stopped, either by always joining via wait_to_finish() or by otherwise
ensuring the thread is detached and never accesses those members again. Apply
the same fix to the later cleanup block that deletes filesystem/new_filesystem
so the thread lifecycle and state teardown remain synchronized.

In `@editor/scene/2d/tiles/tiles_editor_plugin.cpp`:
- Around line 360-370: The timeout in TilesEditorUtils teardown is ineffective
because pattern_preview_thread.wait_to_finish() still blocks after the wait loop
breaks. Update the shutdown path around pattern_thread_exited, TIMEOUT_USEC, and
pattern_preview_thread so the worker is guaranteed to exit before joining, or
convert the timeout into a warning-only threshold and continue waiting until the
thread finishes. Keep the fix localized to the destructor/cleanup logic in
tiles_editor_plugin.cpp and preserve the existing warning behavior.

In `@platform/linuxbsd/wayland/wayland_thread.cpp`:
- Around line 4891-4904: `WaylandThread::wait_window_rect_ms` reads shared
window state before taking `mutex`, creating the same race that
`wait_frame_suspend_ms` avoids. Move the `windows.has(p_id)` check and the
`windows[p_id].rect.size` snapshot to after `MutexLock mutex_lock(mutex);`, so
all accesses to `windows` and `rect.size` happen under the lock before calling
`wl_display_roundtrip`.
- Around line 1440-1441: The xdg configure handlers are not marking the initial
configure as complete, so the non-libdecor path keeps retrying through all
MAX_CONFIGURE_ROUNDTRIPS and hits the warning on first window creation. Update
the xdg configure flow in _xdg_surface_on_configure() and/or
_xdg_toplevel_on_configure() to set ws->initial_configure_done when the initial
configure is received, and remove the duplicated wl_display_roundtrip comment
from that area.

In `@platform/linuxbsd/x11/display_server_x11.cpp`:
- Around line 2005-2006: The show_window verbose log statement in
display_server_x11.cpp is missing its terminating semicolon, causing a compile
error. Update the show_window path in display_server_x11::... where
print_verbose(vformat("show_window: final size %s", wd.size)) appears so the
statement is properly terminated with a semicolon and the function body compiles
cleanly.

---

Outside diff comments:
In `@editor/doc/editor_help.cpp`:
- Around line 3162-3188: EditorHelp::_load_script_doc_cache_thread currently
returns early on the failed load/corrupt-cache path without signaling
_worker_thread_done, which can leave cleanup_doc() waiting indefinitely. Update
this function so every exit path, including the null ResourceLoader::load case
and any other failure returns, sets _worker_thread_done before returning; keep
the normal success path signaling as well.

In `@editor/inspector/editor_resource_preview.cpp`:
- Around line 606-616: The timeout in EditorResourcePreview’s preview-thread
loop only stops the polling loop, but thread.wait_to_finish() still blocks
forever afterward. Update the shutdown flow so the same TIMEOUT_USEC limit also
applies to the join path: either avoid calling wait_to_finish() after timeout,
or keep waiting while treating the timeout as a warning threshold rather than a
hard exit. Use the existing exited flag, TIMEOUT_USEC check, and
thread.wait_to_finish() in EditorResourcePreview to make the behavior
consistent.

---

Nitpick comments:
In `@drivers/gles3/rasterizer_gles3.cpp`:
- Around line 491-502: The boot-image screen rect scaling and centering logic in
rasterizer_gles3.cpp is duplicated in RendererCompositorRD::set_boot_image, so
update this path to use a shared helper instead of maintaining two copies.
Factor the screen_get_scale multiplication and floor-centered offset computation
into a common function near OS::calculate_boot_screen_rect, then have both
RasterizerGLES3 and RendererCompositorRD call it so the behavior stays
consistent.

In `@platform/linuxbsd/x11/display_server_x11.cpp`:
- Around line 1942-1955: The fallback polling in display_server_x11.cpp holds
events_mutex while calling OS::get_singleton()->delay_usec(5000), which can
block _poll_events_thread from accessing polled_events during startup. Update
the logic around the ConfigureNotify wait in the X11 window setup path so the
mutex is only held for the XCheckTypedWindowEvent/_window_changed access, and is
released before any sleep/delay in the loop. Keep the change localized to the
startup polling block that uses get_config_event, wd.maximized, wd.fullscreen,
and MAIN_WINDOW_ID behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 843d9ed9-d56d-4c1c-a946-394739f82eb2

📥 Commits

Reviewing files that changed from the base of the PR and between 4f5b14a and 0d65b5c.

📒 Files selected for processing (18)
  • core/object/worker_thread_pool.cpp
  • core/os/condition_variable.h
  • drivers/gles3/rasterizer_gles3.cpp
  • editor/doc/editor_help.cpp
  • editor/doc/editor_help.h
  • editor/file_system/editor_file_system.cpp
  • editor/inspector/editor_resource_preview.cpp
  • editor/scene/2d/tiles/tiles_editor_plugin.cpp
  • editor/scene/2d/tiles/tiles_editor_plugin.h
  • main/main.cpp
  • platform/linuxbsd/wayland/display_server_wayland.cpp
  • platform/linuxbsd/wayland/display_server_wayland.h
  • platform/linuxbsd/wayland/wayland_thread.cpp
  • platform/linuxbsd/wayland/wayland_thread.h
  • platform/linuxbsd/x11/display_server_x11.cpp
  • platform/linuxbsd/x11/display_server_x11.h
  • servers/display_server.h
  • servers/rendering/renderer_rd/renderer_compositor_rd.cpp

Comment on lines +861 to 884
const uint64_t TIMEOUT_USEC = 5000000; // 5 seconds per phase

MutexLock lock(task_mutex);

// Wait until all threads are idle.
_switch_runlevel(RUNLEVEL_PRE_EXIT_LANGUAGES);
while (runlevel_data.pre_exit_languages.num_idle_threads != threads.size()) {
control_cond_var.wait(lock);
bool notified = control_cond_var.wait_for_usec(lock, TIMEOUT_USEC);
if (!notified && runlevel_data.pre_exit_languages.num_idle_threads != threads.size()) {
WARN_PRINT("WorkerThreadPool: Timed out waiting for threads to become idle during shutdown. "
"Some tasks may have been abandoned.");
break;
}
}

// Wait until all threads have detached from scripting languages.
_switch_runlevel(RUNLEVEL_EXIT_LANGUAGES);
while (runlevel_data.exit_languages.num_exited_threads != threads.size()) {
control_cond_var.wait(lock);
bool notified = control_cond_var.wait_for_usec(lock, TIMEOUT_USEC);
if (!notified && runlevel_data.exit_languages.num_exited_threads != threads.size()) {
WARN_PRINT("WorkerThreadPool: Timed out waiting for threads to exit scripting languages.");
break;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Timeout fallback undermines the shutdown-safety guarantee it's paired with.

Breaking out of the wait loop on timeout still lets the function return normally and move to the next runlevel (and eventually back to Main::cleanup(), which now deletes the main loop/SceneTree right after this call). If a worker task is genuinely stuck or just slow (>5s, or >10s across both phases), a thread can still be executing with stale scene/script references at the moment the SceneTree is deleted and ScriptServer::finish_languages() runs — the exact use-after-free/deadlock scenario the new call ordering in Main::cleanup() is meant to avoid. The timeout converts a hang into a (rarer but still possible) crash during shutdown.

Consider making the timeout path fail loud rather than silently proceed — e.g., skip forcing RUNLEVEL_EXIT_LANGUAGES/subsequent shutdown steps when the previous phase timed out, or escalate (abort/force-terminate stuck threads) instead of falling through to normal teardown.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/object/worker_thread_pool.cpp` around lines 861 - 884, The shutdown wait
loops in WorkerThreadPool’s cleanup path currently break on timeout and then
continue teardown, which can leave worker threads running against stale
SceneTree/script state. Update the timeout handling in the pre-exit and
exit-language wait sections so a timeout does not fall through to normal
shutdown; instead, fail loudly and stop the remaining shutdown sequence, or
otherwise escalate in a way that prevents Main::cleanup() from deleting
scene/script resources while threads may still be active. Use the existing
_switch_runlevel, control_cond_var.wait_for_usec, and WARN_PRINT flow to locate
and adjust the shutdown logic.

Comment on lines 1736 to +1748
while (scanning) {
if (OS::get_singleton()->get_ticks_usec() - start > TIMEOUT_USEC) {
WARN_PRINT("EditorFileSystem: Timed out waiting for scan thread. Thread will be detached.");
timed_out = true;
break;
}
OS::get_singleton()->delay_usec(1000);
}
active_thread.wait_to_finish();
WARN_PRINT("Scan thread aborted...");
if (!timed_out) {
active_thread.wait_to_finish();
} else {
WARN_PRINT("Scan thread aborted...");
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift

Don’t free filesystem state while the scan thread may still be running.

On timeout this skips wait_to_finish(), then immediately deletes filesystem / new_filesystem. If the scan thread is still touching those members, shutdown can race into use-after-free.

Also applies to: 1753-1760

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/file_system/editor_file_system.cpp` around lines 1736 - 1748, The scan
timeout path in EditorFileSystem::reconnect_scan_thread currently skips waiting
for the worker and can free filesystem state while the scan thread is still
active. Update the timeout handling around active_thread/scanning so any
shutdown or cleanup that touches filesystem and new_filesystem only happens
after the scan thread has fully stopped, either by always joining via
wait_to_finish() or by otherwise ensuring the thread is detached and never
accesses those members again. Apply the same fix to the later cleanup block that
deletes filesystem/new_filesystem so the thread lifecycle and state teardown
remain synchronized.

Comment on lines 360 to 370
while (!pattern_thread_exited.is_set()) {
if (OS::get_singleton()->get_ticks_usec() - start > TIMEOUT_USEC) {
WARN_PRINT("TilesEditorUtils: Timed out waiting for pattern preview thread to exit.");
break;
}
OS::get_singleton()->delay_usec(10000);
RenderingServer::get_singleton()->sync(); //sync pending stuff, as thread may be blocked on visual server
RenderingServer::get_singleton()->sync();
MessageQueue::get_singleton()->flush();
}

pattern_preview_thread.wait_to_finish();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

The destructor timeout is bypassed by the unconditional join.

If the worker does not set pattern_thread_exited, the loop breaks after 3 seconds but Line 370 can still block forever. Make the worker exit deterministic before joining, or keep waiting and make this a warning-only threshold.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/scene/2d/tiles/tiles_editor_plugin.cpp` around lines 360 - 370, The
timeout in TilesEditorUtils teardown is ineffective because
pattern_preview_thread.wait_to_finish() still blocks after the wait loop breaks.
Update the shutdown path around pattern_thread_exited, TIMEOUT_USEC, and
pattern_preview_thread so the worker is guaranteed to exit before joining, or
convert the timeout into a warning-only threshold and continue waiting until the
thread finishes. Keep the fix localized to the destructor/cleanup logic in
tiles_editor_plugin.cpp and preserve the existing warning behavior.

Comment on lines +1440 to +1441
ws->initial_configure_done = true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check whether any handler besides libdecor_frame_on_configure sets initial_configure_done.
rg -n 'initial_configure_done' platform/linuxbsd/wayland/wayland_thread.cpp platform/linuxbsd/wayland/wayland_thread.h
rg -nA20 '_xdg_surface_on_configure\(void \*data' platform/linuxbsd/wayland/wayland_thread.cpp

Repository: Redot-Engine/redot-engine

Length of output: 1454


🏁 Script executed:

#!/bin/bash
sed -n '1208,1255p' platform/linuxbsd/wayland/wayland_thread.cpp
sed -n '1428,1450p' platform/linuxbsd/wayland/wayland_thread.cpp
sed -n '3588,3615p' platform/linuxbsd/wayland/wayland_thread.cpp

Repository: Redot-Engine/redot-engine

Length of output: 3511


Set initial_configure_done in the xdg configure path

Neither _xdg_surface_on_configure() nor _xdg_toplevel_on_configure() flips it, so the non-libdecor path will always consume all MAX_CONFIGURE_ROUNDTRIPS and log the warning on initial window creation. The duplicated // wl_display_roundtrip(wl_display); comment can go too.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@platform/linuxbsd/wayland/wayland_thread.cpp` around lines 1440 - 1441, The
xdg configure handlers are not marking the initial configure as complete, so the
non-libdecor path keeps retrying through all MAX_CONFIGURE_ROUNDTRIPS and hits
the warning on first window creation. Update the xdg configure flow in
_xdg_surface_on_configure() and/or _xdg_toplevel_on_configure() to set
ws->initial_configure_done when the initial configure is received, and remove
the duplicated wl_display_roundtrip comment from that area.

Comment on lines +4891 to +4904
bool WaylandThread::wait_window_rect_ms(DisplayServer::WindowID p_id, int p_timeout) {
ERR_FAIL_COND_V(!windows.has(p_id), false);

Size2i initial_size = windows[p_id].rect.size;

// Same unblock trick as wait_frame_suspend_ms: the polling thread may be
// stuck in prepare_read; a roundtrip bypasses it safely.
MutexLock mutex_lock(mutex);
wl_display_roundtrip(wl_display);

if (windows[p_id].rect.size != initial_size) {
return true; // Already changed during the roundtrip.
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Data race: windows[p_id].rect.size read before acquiring mutex.

ERR_FAIL_COND_V(!windows.has(p_id), false); and Size2i initial_size = windows[p_id].rect.size; run before the MutexLock mutex_lock(mutex); is taken. This state is mutated by the dedicated Wayland polling thread (e.g. via window_state_update_size() from configure callbacks), which is exactly why sibling method wait_frame_suspend_ms takes the lock as its very first statement before touching any shared state. Move these two lines after acquiring the lock.

🔒 Suggested fix
 bool WaylandThread::wait_window_rect_ms(DisplayServer::WindowID p_id, int p_timeout) {
-	ERR_FAIL_COND_V(!windows.has(p_id), false);
-
-	Size2i initial_size = windows[p_id].rect.size;
-
 	// Same unblock trick as wait_frame_suspend_ms: the polling thread may be
 	// stuck in prepare_read; a roundtrip bypasses it safely.
 	MutexLock mutex_lock(mutex);
+
+	ERR_FAIL_COND_V(!windows.has(p_id), false);
+
+	Size2i initial_size = windows[p_id].rect.size;
+
 	wl_display_roundtrip(wl_display);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
bool WaylandThread::wait_window_rect_ms(DisplayServer::WindowID p_id, int p_timeout) {
ERR_FAIL_COND_V(!windows.has(p_id), false);
Size2i initial_size = windows[p_id].rect.size;
// Same unblock trick as wait_frame_suspend_ms: the polling thread may be
// stuck in prepare_read; a roundtrip bypasses it safely.
MutexLock mutex_lock(mutex);
wl_display_roundtrip(wl_display);
if (windows[p_id].rect.size != initial_size) {
return true; // Already changed during the roundtrip.
}
bool WaylandThread::wait_window_rect_ms(DisplayServer::WindowID p_id, int p_timeout) {
// Same unblock trick as wait_frame_suspend_ms: the polling thread may be
// stuck in prepare_read; a roundtrip bypasses it safely.
MutexLock mutex_lock(mutex);
ERR_FAIL_COND_V(!windows.has(p_id), false);
Size2i initial_size = windows[p_id].rect.size;
wl_display_roundtrip(wl_display);
if (windows[p_id].rect.size != initial_size) {
return true; // Already changed during the roundtrip.
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@platform/linuxbsd/wayland/wayland_thread.cpp` around lines 4891 - 4904,
`WaylandThread::wait_window_rect_ms` reads shared window state before taking
`mutex`, creating the same race that `wait_frame_suspend_ms` avoids. Move the
`windows.has(p_id)` check and the `windows[p_id].rect.size` snapshot to after
`MutexLock mutex_lock(mutex);`, so all accesses to `windows` and `rect.size`
happen under the lock before calling `wl_display_roundtrip`.

Comment on lines +2005 to +2006
print_verbose(vformat("show_window: final size %s", wd.size))
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🔴 Critical | ⚡ Quick win

Missing semicolon — compile error.

print_verbose(vformat("show_window: final size %s", wd.size)) has no terminating ;. As written this fails to compile.

🐛 Fix
-	print_verbose(vformat("show_window: final size %s", wd.size))
+	print_verbose(vformat("show_window: final size %s", wd.size));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
print_verbose(vformat("show_window: final size %s", wd.size))
}
print_verbose(vformat("show_window: final size %s", wd.size));
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@platform/linuxbsd/x11/display_server_x11.cpp` around lines 2005 - 2006, The
show_window verbose log statement in display_server_x11.cpp is missing its
terminating semicolon, causing a compile error. Update the show_window path in
display_server_x11::... where print_verbose(vformat("show_window: final size
%s", wd.size)) appears so the statement is properly terminated with a semicolon
and the function body compiles cleanly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Wayland] Editor Splash is stretched

2 participants