Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-61117

Startup error results in a hang on shutdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.0.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      Start a 4.9 FCV replica set member with a 5.1 binary.

      Show
      Start a 4.9 FCV replica set member with a 5.1 binary.
    • Replication 2021-11-29, Replication 2021-12-13, Replication 2021-12-27, Replication 2022-01-10, Replication 2022-01-24, Replication 2022-02-07, Repl 2022-02-21, Repl 2022-03-07, Repl 2022-03-21, Repl 2022-04-04

      For the following startup error, the shutdown process will hang forever, waiting for replication to finish starting up:

      "t":{"$date":"2021-10-29T13:43:41.902+00:00"},"s":"E",  "c":"CONTROL",  "id":20557,   "ctx":"initandlisten","msg":"DBException in initAndListen, terminating","attr":{"error":"Location4926900: Invalid value for featureCompatibilityVersiondocument in admin.system.version, found 4.9, expected '5.0' or '5.0' or '5.1. See https://docs.mongodb.com/master/release-notes/5.0-compatibility/#feature-compatibility."}}
      

      The hang seems to happen when the main thread subsequently calls _waitForStartupComplete() on the repl coord.

      Thread 1 (Thread 0x7f1f24a1abc0 (LWP 322078)):
      #0  0x00007f1f213b6a35 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x000055b040236b7c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000055b03d7f482c in void std::_V2::condition_variable_any::wait<std::unique_lock<mongo::latch_detail::Latch> >(std::unique_lock<mongo::latch_detail::Latch>&) ()
      #3  0x000055b03d7d00f3 in mongo::repl::ReplicationCoordinatorImpl::_waitForStartUpComplete() ()
      #4  0x000055b03d7ec2de in mongo::repl::ReplicationCoordinatorImpl::shutdown(mongo::OperationContext*) ()
      #5  0x000055b03d666fbe in mongo::(anonymous namespace)::shutdownTask(mongo::ShutdownTaskArgs const&) ()
      #6  0x000055b04008da55 in mongo::(anonymous namespace)::runTasks(std::stack<mongo::unique_function<void (mongo::ShutdownTaskArgs const&)>, std::deque<mongo::unique_function<void (mongo::ShutdownTaskArgs const&)>, std::allocator<mongo::unique_function<void (mongo::ShutdownTaskArgs const&)> > > >, mongo::ShutdownTaskArgs const&) ()
      #7  0x000055b03d4d944d in mongo::shutdown(mongo::ExitCode, mongo::ShutdownTaskArgs const&) ()
      #8  0x000055b03cf96910 in mongo::exitCleanly(mongo::ExitCode) ()
      #9  0x000055b03d665961 in mongo::mongod_main(int, char**) ()
      #10 0x000055b03d4e93ee in main ()
      

      In general, this type of hang is a potential issue for all exceptions that can occur in initAndListen.
      The preceding example was taken from a particular node in Serverless QA that was mistakenly started using a newer binary without first updating the FCV.

            Assignee:
            vesselina.ratcheva@mongodb.com Vesselina Ratcheva (Inactive)
            Reporter:
            milkie@mongodb.com Eric Milkie
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: