Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66303

Ensure ReplicaSetMonitorManager shuts down before ServiceContextTest resets the global context

    • Fully Compatible
    • Sharding 2022-06-27, Sharding 2022-07-11
    • 14

      The issue is that setGlobalServiceContext is not thread safe. getGlobalServiceContext is also written in a way that assumes that setGlobalServiceContext is only called once at the beginning when there is only a single thread of context. Cpp tests that repeatedly calls setGlobalServiceContext can run into race with setGlobalServiceContext setting the global variable to null and before the destructor of the previous service context gets run (which is right after the end of this scope). Since the set and get are not protected by mutex, threads owned by the old service context trying to call getGlobalServiceContext will be able to observe that glolbal context variable has been set to null even before it completely gets destroyed.

      One particular workaround for the cpp tests is to shutdown and join the threads that is known to access getGlobalServiceContext before calling setGlobalServiceContext or replace the getGlobalServiceContext calls with references to serviceContext and make sure that ServiceContext destructor joins all threads that can reference it.

      Original description:

      The RSMM is a decoration on the ServiceContext. For any unit tests that inherit from ServiceContextTest and link in the RSM/RSMM, ServiceContext's destructor is called before the ReplicaSetMonitorManager's (because the RSMM's destructor is called as part of ~DecorationContainer, which is a member of Decorable objects). The RSM uses the RSMM's executor, which is shutdown in RSMM::shutdown(), which is called from ~RSMM. So, its possible for a task still to exist on this executor, and run after the ServiceContext has been destroyed - if the task tries to grab the service context, the test will crash (as happened in the linked BF).

      We attempted to fix this by using ServiceContext::ConstructorActionRegisterer to declare a destructor to be called before the ServiceContext is destroyed, but the ServiceContext used in ServiceContextTest isn't constructed using UniqueServiceContext, so it won't get registered.

      You can repro the issue in the linked BF by placing a call to getGlobalServiceContext() in RSMM::shutdown() (and running the test that failed in the BF - tenant_migration_donor_service_test).

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            janna.golden@mongodb.com Janna Golden
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: