[Core] Improve Dashborad SubprocessModuleHandle.destroy_module() Resource Cleanup and Process Termination#60172
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the dashboard subprocess handling to improve graceful shutdown and resource cleanup. The changes introduce more robust process termination, explicit resource management for HTTP sessions, and attempts to handle signals for graceful exit. While the direction is excellent, I've identified a critical issue in the new signal handling logic that would cause a RuntimeError, and a high-severity race condition in the resource cleanup order. I've provided detailed comments and suggestions to address these issues, which should make the shutdown process more reliable.
Change-Id: Ia045cdf34ce8c473da05e114b76e07106c7bc4e9 Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
Change-Id: I564b0b41292666b9da8c39e7fe494fd40ffe3d1a Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
|
@sampan-s-nayak PTAL |
sampan-s-nayak
left a comment
There was a problem hiding this comment.
Thanks for working on this (and for the detailed pr description). LGTM
Introduced a new constant for the graceful shutdown timeout of the subprocess module, allowing for better control over process termination. Updated the subprocess handling to utilize this new timeout value. Change-Id: I4e49b28766c7c63956e6b9db1cf6b33cebc34210 Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
Change-Id: I0ec1002cc87837e91006cdacb31a724b02eaa5c0 Signed-off-by: daiping8 <dai.ping88@zte.com.cn>
…e() Resource Cleanup and Process Termination (ray-project#60172)" This reverts commit eaeb585.
Description
Summary
This PR improves the
destroy_module()method inSubprocessModuleHandleto fix race conditions, implement graceful process termination, and ensure complete resource cleanup. The changes prevent resource leaks and improve the reliability of module restart and shutdown.Key Changes
terminate()and 5-second timeoutkill()) only if necessaryjoin()calls have timeouts to prevent infinite blockingModified Files
python/ray/dashboard/subprocesses/handle.pydestroy_module()with ordered resource cleanuppython/ray/dashboard/subprocesses/tests/test_e2e.pytest_destroy_module_cleans_up_resources()to verify complete resource cleanup_DummyConn,_DummyProcess,_DummySession) for isolated testingstart_http_server_app()to prevent resource leaks between testsRelated issues
Closes #60214