[SERVER-62402] Ignore timeouts when running `ServiceEntryPointImpl::shutdown` under sanitizers Created: 06/Jan/22  Updated: 17/Feb/22  Resolved: 11/Feb/22

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Daniel Morilha (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-63768 Create a manually triggered evergreen... Closed
is related to SERVER-57427 Avoid special-case handling in Servic... Closed
Sprint: Service Arch 2022-1-24, Service Arch 2022-2-07, Service Arch 2022-2-21
Participants:
Linked BF Score: 0
Story Points: 2

 Description   

We currently run ServiceEntryPointImpl::shutdownAndWait only if address or thread sanitizer is enabled:

bool ServiceEntryPointImpl::shutdown(Milliseconds timeout) {
#if __has_feature(address_sanitizer) || __has_feature(thread_sanitizer)
    // When running under address sanitizer, we get false positive leaks due to disorder around
    // the lifecycle of a connection and request. When we are running under ASAN, we try a lot
    // harder to dry up the server from active connections before going on to really shut down.
    return shutdownAndWait(timeout);
#else
    return true;
#endif
}

The invocation is provided with 10 seconds as the timeout:

// Shutdown the Service Entry Point and its sessions and give it a grace period to complete.
if (auto sep = serviceContext->getServiceEntryPoint()) {
    LOGV2_OPTIONS(4784923, {LogComponent::kCommand}, "Shutting down the ServiceEntryPoint");
    if (!sep->shutdown(Seconds(10))) {
        LOGV2_OPTIONS(20563, {LogComponent::kNetwork}, "Service entry point did not shutdown within the time limit");
    }
}

When running the sanitizers, a timeout would cause the process to terminate prematurely and report leaks that are not real (i.e., false alarms). The recommendation is to use a very large timeout (e.g., Seconds::max()) and make sure the hang-analyzer runs if a thread is taking a very long time to join.

Since we only run this code when sanitizers are enabled, this change will not impact production behavior.

AC: Change the timeout to 30 seconds, and invariant or LOGV2_FATAL if shutdown isn't achieved within that timeout.



 Comments   
Comment by Daniel Morilha (Inactive) [ 11/Feb/22 ]

We've collectively decided we should no longer pursue increasing these timeouts.

Comment by Githook User [ 28/Jan/22 ]

Author:

{'name': 'Daniel Vitor Morilha', 'email': 'daniel.morilha@mongodb.com', 'username': 'daniel-mdb'}

Message: Revert "SERVER-62402 Ignore timeouts when running `ServiceEntryPointImpl::shutdown` under sanitizers"

This reverts commit a35742f044e3239d88c3fdd23fbe844881db2546.
Branch: master
https://github.com/mongodb/mongo/commit/db799be5aebf432380cb5f7acb0f204fbc120a13

Comment by Githook User [ 24/Jan/22 ]

Author:

{'name': 'Daniel Vitor Morilha', 'email': 'daniel.morilha@mongodb.com', 'username': 'daniel-mdb'}

Message: SERVER-62402 Ignore timeouts when running `ServiceEntryPointImpl::shutdown` under sanitizers
Branch: master
https://github.com/mongodb/mongo/commit/a35742f044e3239d88c3fdd23fbe844881db2546

Generated at Thu Feb 08 05:55:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.