Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions core/object/worker_thread_pool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -858,18 +858,29 @@ void WorkerThreadPool::exit_languages_threads() {
return;
}

const uint64_t TIMEOUT_USEC = 5000000; // 5 seconds per phase

MutexLock lock(task_mutex);

// Wait until all threads are idle.
_switch_runlevel(RUNLEVEL_PRE_EXIT_LANGUAGES);
while (runlevel_data.pre_exit_languages.num_idle_threads != threads.size()) {
control_cond_var.wait(lock);
bool notified = control_cond_var.wait_for_usec(lock, TIMEOUT_USEC);
if (!notified && runlevel_data.pre_exit_languages.num_idle_threads != threads.size()) {
WARN_PRINT("WorkerThreadPool: Timed out waiting for threads to become idle during shutdown. "
"Some tasks may have been abandoned.");
break;
}
}

// Wait until all threads have detached from scripting languages.
_switch_runlevel(RUNLEVEL_EXIT_LANGUAGES);
while (runlevel_data.exit_languages.num_exited_threads != threads.size()) {
control_cond_var.wait(lock);
bool notified = control_cond_var.wait_for_usec(lock, TIMEOUT_USEC);
if (!notified && runlevel_data.exit_languages.num_exited_threads != threads.size()) {
WARN_PRINT("WorkerThreadPool: Timed out waiting for threads to exit scripting languages.");
break;
}
}
Comment on lines +861 to 884

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Timeout fallback undermines the shutdown-safety guarantee it's paired with.

Breaking out of the wait loop on timeout still lets the function return normally and move to the next runlevel (and eventually back to Main::cleanup(), which now deletes the main loop/SceneTree right after this call). If a worker task is genuinely stuck or just slow (>5s, or >10s across both phases), a thread can still be executing with stale scene/script references at the moment the SceneTree is deleted and ScriptServer::finish_languages() runs — the exact use-after-free/deadlock scenario the new call ordering in Main::cleanup() is meant to avoid. The timeout converts a hang into a (rarer but still possible) crash during shutdown.

Consider making the timeout path fail loud rather than silently proceed — e.g., skip forcing RUNLEVEL_EXIT_LANGUAGES/subsequent shutdown steps when the previous phase timed out, or escalate (abort/force-terminate stuck threads) instead of falling through to normal teardown.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@core/object/worker_thread_pool.cpp` around lines 861 - 884, The shutdown wait
loops in WorkerThreadPool’s cleanup path currently break on timeout and then
continue teardown, which can leave worker threads running against stale
SceneTree/script state. Update the timeout handling in the pre-exit and
exit-language wait sections so a timeout does not fall through to normal
shutdown; instead, fail loudly and stop the remaining shutdown sequence, or
otherwise escalate in a way that prevents Main::cleanup() from deleting
scene/script resources while threads may still be active. Use the existing
_switch_runlevel, control_cond_var.wait_for_usec, and WARN_PRINT flow to locate
and adjust the shutdown logic.

}

Expand Down
21 changes: 20 additions & 1 deletion core/os/condition_variable.h
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@
#define THREADING_NAMESPACE std
#endif

#include <chrono>

/// An object one or multiple threads can wait on a be notified by some other.
/// Normally, you want to use a semaphore for such scenarios, but when the
/// condition is something different than a count being greater than zero
Expand All @@ -71,6 +73,17 @@ class ConditionVariable {
condition.wait(p_lock.mutex._get_lock());
}

/// @return `true` if notified before timeout, false if timed out.
template <typename BinaryMutexT>
_ALWAYS_INLINE_ bool wait_for_usec(const MutexLock<BinaryMutexT> &p_lock, uint64_t p_usec) const {
return condition.wait_for(p_lock._get_lock(), std::chrono::microseconds(p_usec)) == THREADING_NAMESPACE::cv_status::no_timeout;
}

template <int Tag>
_ALWAYS_INLINE_ bool wait_for_usec(const MutexLock<SafeBinaryMutex<Tag>> &p_lock, uint64_t p_usec) const {
return condition.wait_for(p_lock.mutex._get_lock(), std::chrono::microseconds(p_usec)) == THREADING_NAMESPACE::cv_status::no_timeout;
}

_ALWAYS_INLINE_ void notify_one() const {
condition.notify_one();
}
Expand All @@ -81,13 +94,19 @@ class ConditionVariable {
};

#else // No threads.

class ConditionVariable {
public:
template <typename BinaryMutexT>
void wait(const MutexLock<BinaryMutexT> &p_lock) const {}
template <int Tag>
void wait(const MutexLock<SafeBinaryMutex<Tag>> &p_lock) const {}
void notify_one() const {}
void notify_all() const {}

template <typename BinaryMutexT>
bool wait_for_usec(const MutexLock<BinaryMutexT> &p_lock, uint64_t p_usec) const { return true; }
template <int Tag>
bool wait_for_usec(const MutexLock<SafeBinaryMutex<Tag>> &p_lock, uint64_t p_usec) const { return true; }
};

#endif // THREADS_ENABLED
19 changes: 6 additions & 13 deletions drivers/gles3/rasterizer_gles3.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -489,23 +489,16 @@ void RasterizerGLES3::set_boot_image(const Ref<Image> &p_image, const Color &p_c
texture_storage->texture_2d_initialize(texture, p_image);

Rect2 imgrect(0, 0, p_image->get_width(), p_image->get_height());
Size2 win_size_f = Size2(win_size); // This is needed for the .floor() function below because Size2i does not have a floor() function (but Size2 does)
Rect2 screenrect;
if (p_scale) {
if (win_size.width > win_size.height) {
//scale horizontally
screenrect.size.y = win_size.height;
screenrect.size.x = imgrect.size.x * win_size.height / imgrect.size.y;
screenrect.position.x = (win_size.width - screenrect.size.x) / 2;

} else {
//scale vertically
screenrect.size.x = win_size.width;
screenrect.size.y = imgrect.size.y * win_size.width / imgrect.size.x;
screenrect.position.y = (win_size.height - screenrect.size.y) / 2;
}
if (p_scale) {
screenrect = OS::get_singleton()->calculate_boot_screen_rect(win_size, imgrect.size);
} else {
float screen_scale = DisplayServer::get_singleton()->screen_get_scale();
screenrect = imgrect;
screenrect.position += ((Size2(win_size.width, win_size.height) - screenrect.size) / 2.0).floor();
screenrect.size *= screen_scale; // convert image rect to physical pixels - necessary for Wayland - should return 1.0 elsewhere
screenrect.position += ((win_size_f - screenrect.size) / 2.0).floor();
}

#ifdef WINDOWS_ENABLED
Expand Down
50 changes: 49 additions & 1 deletion editor/doc/editor_help.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -3061,6 +3061,8 @@ void EditorHelp::_load_doc_thread(void *p_udata) {
}

OS::get_singleton()->benchmark_end_measure("EditorHelp", vformat("Generate Documentation (Run %d)", doc_generation_count));

_worker_thread_done.set();
}

void EditorHelp::_gen_doc_thread(void *p_udata) {
Expand Down Expand Up @@ -3094,6 +3096,8 @@ void EditorHelp::_gen_doc_thread(void *p_udata) {
}

OS::get_singleton()->benchmark_end_measure("EditorHelp", vformat("Generate Documentation (Run %d)", doc_generation_count));

_worker_thread_done.set();
}

void EditorHelp::_gen_extensions_docs() {
Expand All @@ -3109,6 +3113,10 @@ static void _load_script_doc_cache(bool p_changes) {
}

void EditorHelp::load_script_doc_cache() {
if (_cleanup_in_progress) {
return;
}

if (!ProjectSettings::get_singleton()->is_project_loaded()) {
print_verbose("Skipping loading script doc cache since no project is open.");
return;
Expand All @@ -3132,6 +3140,7 @@ void EditorHelp::load_script_doc_cache() {
return;
}

_worker_thread_done.set_to(false);
worker_thread.start(_load_script_doc_cache_thread, nullptr);
}

Expand Down Expand Up @@ -3175,6 +3184,8 @@ void EditorHelp::_load_script_doc_cache_thread(void *p_udata) {

// Always delete the doc cache after successful load since most uses of editor will change a script, invalidating cache.
_delete_script_doc_cache();

_worker_thread_done.set();
}

/// Helper method to deal with "sources_changed" signal having a parameter.
Expand All @@ -3183,6 +3194,10 @@ static void _regenerate_script_doc_cache(bool p_changes) {
}

void EditorHelp::regenerate_script_doc_cache() {
if (_cleanup_in_progress) {
return;
}

if (EditorFileSystem::get_singleton()->is_scanning()) {
// Wait until EditorFileSystem scanning is complete to use updated filesystem structure.
EditorFileSystem::get_singleton()->connect(SNAME("sources_changed"), callable_mp_static(_regenerate_script_doc_cache), CONNECT_ONE_SHOT);
Expand All @@ -3191,6 +3206,7 @@ void EditorHelp::regenerate_script_doc_cache() {

_wait_for_thread(worker_thread);
_wait_for_thread(loader_thread);
_loader_thread_done.set_to(false);
loader_thread.start(_regen_script_doc_thread, EditorFileSystem::get_singleton()->get_filesystem());
}

Expand All @@ -3200,6 +3216,8 @@ void EditorHelp::_finish_regen_script_doc_thread(void *p_udata) {
_script_docs_loaded.set();

OS::get_singleton()->benchmark_end_measure("EditorHelp", "Generate Script Documentation");

_worker_thread_done.set();
}

void EditorHelp::_regen_script_doc_thread(void *p_udata) {
Expand All @@ -3215,6 +3233,9 @@ void EditorHelp::_regen_script_doc_thread(void *p_udata) {

_reload_scripts_documentation(dir);

_loader_thread_done.set();
_worker_thread_done.set_to(false); // worker_thread is about to start

// All ResourceLoader::load() calls are done, so we can no longer deadlock with main thread.
// Switch to back to worker_thread from loader_thread to resynchronize access to DocData.
worker_thread.start(_finish_regen_script_doc_thread, nullptr);
Expand Down Expand Up @@ -3265,6 +3286,10 @@ void EditorHelp::save_script_doc_cache() {
}

void EditorHelp::generate_doc(bool p_use_cache, bool p_use_script_cache) {
if (_cleanup_in_progress) {
return;
}

doc_generation_count++;
OS::get_singleton()->benchmark_begin_measure("EditorHelp", vformat("Generate Documentation (Run %d)", doc_generation_count));

Expand All @@ -3280,10 +3305,12 @@ void EditorHelp::generate_doc(bool p_use_cache, bool p_use_script_cache) {
}

if (p_use_cache && FileAccess::exists(get_cache_full_path())) {
_worker_thread_done.set_to(false);
worker_thread.start(_load_doc_thread, (void *)p_use_script_cache);
} else {
print_verbose("Regenerating editor help cache");
doc->generate();
_worker_thread_done.set_to(false);
worker_thread.start(_gen_doc_thread, (void *)p_use_script_cache);
}
}
Expand Down Expand Up @@ -3376,7 +3403,28 @@ void EditorHelp::update_doc() {
}

void EditorHelp::cleanup_doc() {
_wait_for_thread();
_cleanup_in_progress = true;

// loader_thread runs _regen_script_doc_thread which calls ResourceLoader::load().
// Flush the message queue so load tasks can dispatch to the main thread.
if (loader_thread.is_started()) {
while (!_loader_thread_done.is_set()) {
MessageQueue::get_singleton()->flush();
OS::get_singleton()->delay_usec(1000);
}
loader_thread.wait_to_finish();
}

// worker_thread runs _load_doc_thread, _gen_doc_thread, _load_script_doc_cache_thread,
// or _finish_regen_script_doc_thread. All may post deferred calls needing the main thread.
if (worker_thread.is_started()) {
while (!_worker_thread_done.is_set()) {
MessageQueue::get_singleton()->flush();
OS::get_singleton()->delay_usec(1000);
}
worker_thread.wait_to_finish();
}

memdelete(doc);
doc = nullptr;
}
Expand Down
4 changes: 4 additions & 0 deletions editor/doc/editor_help.h
Original file line number Diff line number Diff line change
Expand Up @@ -204,6 +204,10 @@ class EditorHelp : public VBoxContainer {
inline static Thread loader_thread; ///< Only load scripts here to avoid deadlocking with main thread.

inline static SafeFlag _script_docs_loaded = SafeFlag(false);
inline static SafeFlag _worker_thread_done = SafeFlag(false);
inline static SafeFlag _loader_thread_done = SafeFlag(false);
inline static bool _cleanup_in_progress = false;

inline static LocalVector<DocData::ClassDoc> _docs_to_add;
inline static LocalVector<String> _docs_to_remove;
inline static LocalVector<String> _docs_to_remove_by_path;
Expand Down
16 changes: 14 additions & 2 deletions editor/file_system/editor_file_system.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1730,11 +1730,23 @@ void EditorFileSystem::_notification(int p_what) {
case NOTIFICATION_EXIT_TREE: {
Thread &active_thread = thread.is_started() ? thread : thread_sources;
if (use_threads && active_thread.is_started()) {
const uint64_t TIMEOUT_USEC = 3000000; // 3 seconds
uint64_t start = OS::get_singleton()->get_ticks_usec();
bool timed_out = false;
while (scanning) {
if (OS::get_singleton()->get_ticks_usec() - start > TIMEOUT_USEC) {
WARN_PRINT("EditorFileSystem: Timed out waiting for scan thread. Thread will be detached.");
timed_out = true;
break;
}
OS::get_singleton()->delay_usec(1000);
}
active_thread.wait_to_finish();
WARN_PRINT("Scan thread aborted...");
if (!timed_out) {
active_thread.wait_to_finish();
} else {
WARN_PRINT("Scan thread aborted...");
}
Comment on lines 1736 to +1748

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🔴 Critical | 🏗️ Heavy lift

Don’t free filesystem state while the scan thread may still be running.

On timeout this skips wait_to_finish(), then immediately deletes filesystem / new_filesystem. If the scan thread is still touching those members, shutdown can race into use-after-free.

Also applies to: 1753-1760

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@editor/file_system/editor_file_system.cpp` around lines 1736 - 1748, The scan
timeout path in EditorFileSystem::reconnect_scan_thread currently skips waiting
for the worker and can free filesystem state while the scan thread is still
active. Update the timeout handling around active_thread/scanning so any
shutdown or cleanup that touches filesystem and new_filesystem only happens
after the scan thread has fully stopped, either by always joining via
wait_to_finish() or by otherwise ensuring the thread is detached and never
accesses those members again. Apply the same fix to the later cleanup block that
deletes filesystem/new_filesystem so the thread lifecycle and state teardown
remain synchronized.


set_process(false);
}

Expand Down
8 changes: 7 additions & 1 deletion editor/inspector/editor_resource_preview.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -600,8 +600,14 @@ void EditorResourcePreview::stop() {
preview_generators.write[i]->abort();
}

const uint64_t TIMEOUT_USEC = 3000000; // 3 seconds
uint64_t start = OS::get_singleton()->get_ticks_usec();

while (!exited.is_set()) {
// Sync pending work.
if (OS::get_singleton()->get_ticks_usec() - start > TIMEOUT_USEC) {
WARN_PRINT("EditorResourcePreview: Timed out waiting for preview thread.");
break;
}
OS::get_singleton()->delay_usec(10000);
RenderingServer::get_singleton()->sync();
MessageQueue::get_singleton()->flush();
Expand Down
Loading
Loading