Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-48650

Unit tests' ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7.0, 4.4.5
    • Component/s: None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v4.4
    • Sprint:
      Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10

      Description

      :tldr

      The ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid; while also being depended upon by any code with a mutex under contention via the DiagnosticInfo code.

      There's a rare (so far) BF dependent on a SERVER ticket linked a couple hops from this one.

      --------------------------------------------------------------------------

      I tried to setup the JournalFlusher thread in our unit test test fixtures, specifically the ServiceContexMongoDTest test fixture. However, I ran into problems like so,

      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.003+0000 | 2020-06-05T20:47:57.003Z I  STORAGE  22320   [main] "Shutting down journal flusher thread"
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 src/mongo/executor/network_interface_mock.h:567:22: runtime error: member call on address 0x61600038ee80 which does not point to an object of type 'mongo::executor::NetworkInterfaceMock'
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000 0x61600038ee80: note: object has a possibly invalid vptr: abs(offset to top) too big
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000  a4 00 00 48  8b 00 00 49 4d 56 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000               ^~~~~~~~~~~~~~~~~~~~~~~
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:47:57.004+0000               possibly invalid vptr
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.642+0000     #0 0x564d4e99e90a in mongo::executor::NetworkInterfaceMockClockSource::now() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/executor/network_interface_mock.h:567:22
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.652+0000     #1 0x564d512227b1 in mongo::DiagnosticInfo::capture(mongo::StringData const&, mongo::DiagnosticInfo::Options) /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/diagnostic_info.cpp:285:73
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.652+0000     #2 0x564d512250a9 in mongo::(anonymous namespace)::_mongoInitializerFunction_DiagnosticInfo(mongo::InitializerContext*)::DiagnosticListener::onContendedLock(mongo::latch_detail::Identity const&) /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/diagnostic_info.cpp:192:43
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.653+0000     #3 0x564d532adfe6 in mongo::latch_detail::Mutex::_onContendedLock() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/platform/mutex.cpp:89:19
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.653+0000     #4 0x564d532ad29e in mongo::latch_detail::Mutex::lock() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/platform/mutex.cpp:55:5
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.678+0000     #5 0x564d4e36c544 in std::unique_lock<mongo::latch_detail::Latch>::lock() /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_mutex.h:267:17
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.678+0000     #6 0x564d4e36beba in std::unique_lock<mongo::latch_detail::Latch>::unique_lock(mongo::latch_detail::Latch&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/std_mutex.h:197:2
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.682+0000     #7 0x564d4fe2d1de in mongo::JournalFlusher::run() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/db/storage/control/journal_flusher.cpp:127:34
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #8 0x564d52b09f7d in mongo::BackgroundJob::jobBody() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/util/background.cpp:160:5
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #9 0x564d52b12375 in std::__invoke_result<mongo::BackgroundJob::go()::$_0>::type std::__invoke<mongo::BackgroundJob::go()::$_0>(mongo::BackgroundJob::go()::$_0&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/invoke.h:95:14
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #10 0x564d52b12375 in decltype(auto) std::__apply_impl<mongo::BackgroundJob::go()::$_0, std::tuple<> >(mongo::BackgroundJob::go()::$_0&&, std::tuple<>&&, std::integer_sequence<unsigned long>) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/tuple:1678
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #11 0x564d52b12375 in decltype(auto) std::apply<mongo::BackgroundJob::go()::$_0, std::tuple<> >(mongo::BackgroundJob::go()::$_0&&, std::tuple<>&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/tuple:1687
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #12 0x564d52b12375 in mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()::operator()() /data/mci/c67a30324c4da5f0e5f82678065dd299/src/src/mongo/stdx/thread.h:186
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.685+0000     #13 0x564d52b12375 in mongo::BackgroundJob::go()::$_0 std::__invoke_impl<void, mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()>(std::__invoke_other, mongo::stdx::thread::thread<mongo::BackgroundJob::go()::$_0, 0>(mongo::BackgroundJob::go()::$_0)::'lambda'()&&) /opt/mongodbtoolchain/revisions/e5348beb43e147b74a40f4ca5fb05a330ea646cf/stow/gcc-v3.U0D/lib/gcc/x86_64-mongodb-linux/8.2.0/../../../../include/c++/8.2.0/bits/invoke.h:60
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.686+0000     #14 0x564d535a52ee in execute_native_thread_routine /data/mci/ff40e18d703ff1f6e8f8ecf75ad7202f/toolchain-builder/tmp/build-gcc-v3.sh-NJq/build/x86_64-mongodb-linux/libstdc++-v3/src/c++11/../../../../../src/combined/libstdc++-v3/src/c++11/thread.cc:80:18
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:06.686+0000     #15 0x7f8e0e26b6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000     #16 0x7f8e0dd7c88e in clone /build/glibc-OTsEL5/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 
      [cpp_unit_test:db_repl_coordinator_test] 2020-06-05T20:48:09.525+0000 SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior src/mongo/executor/network_interface_mock.h:567:22 in
      

      It turns out that a contend mutex acquisition goes into the DiagnosticInfo code, which in turn call into the NetworkInterfaceMockClockSource and tries to use the NetworkInterfaceMock. However, the JournalFlusher (which has a mutex) found the NetworkInterfaceMock to be invalid memory: the NetworkInterfaceMock was already destroyed.

      The test fixture of the failing test is ReplCoordTest. The repl test fixture creates a NetworkInterfaceMock instances, passes ownership of the NetworkInterface into a ThreadPoolTaskExecutor instance, whose ownership is then passed into a ReplicationCoordinatorImpl instance. Meanwhile, NetworkInterfaceMockClockSource instances are created, with pointers to the same NetworkInterfaceMock owned by the ReplicationCoordinatorImpl, and set on the ServiceContext. So things like the JournalFlusher that use Mutexes are now dependent on the the lifetime of the ReplicationCoordinatorImpl.

      The ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid; while also being depended upon by any code with a mutex under contention.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ben.caimano Benjamin Caimano (Inactive)
              Reporter:
              dianna.hohensee Dianna Hohensee
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: