Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16834

Secondary nodes can hang during shutdown if BGSync::_buffer is full

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.8.0-rc5
    • Affects Version/s: 2.8.0-rc4
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL

      During shutdown, it is possible for the replication consumer threads to stop pulling items out of the BGSync::_buffer queue, while the produce thread (oplog tailer/bgsync thread) is blocked trying to insert an item into the same, fixed-sized queue.

      For example, in 2.8.0-rc5-pre-, we can see the following two stacks in a hung system. Thread 3 is stuck because nobody is draining the BGSync::_buffer, and thread 2 is stuck because thread 3 never makes progress and so never checks for shutdown.

      Thread 3 (Thread 0x7ed14c6f9700 (LWP 17201)):
      #0  0x0000003887c0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000bf4690 in void boost::condition_variable_any::wait<boost::unique_lock<boost::timed_mutex> >(boost::unique_lock<boost::timed_mutex>&) ()
          at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:137
      #2  0x0000000000bf82d3 in mongo::repl::BackgroundSync::produce(mongo::OperationContext*) () at src/mongo/util/queue.h:76
      #3  0x0000000000bf981e in mongo::repl::BackgroundSync::_producerThread() () at src/mongo/db/repl/bgsync.cpp:193
      ...
      
      Thread 2 (Thread 0x7ed12c747700 (LWP 17397)):
      #0  0x0000003887c0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000f9a8db in boost::thread::join() () at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:56
      #2  0x0000000000c564a5 in mongo::repl::ReplicationCoordinatorExternalStateImpl::shutdown() () at src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:107
      
      #3  0x0000000000c5b1f3 in mongo::repl::ReplicationCoordinatorImpl::shutdown() () at src/mongo/db/repl/replication_coordinator_impl.cpp:371
      #4  0x0000000000aa429a in mongo::exitCleanly(mongo::ExitCode) () at src/mongo/db/instance.cpp:1101
      #5  0x00000000009cf75a in mongo::CmdShutdown::shutdownHelper() () at src/mongo/db/dbcommands_generic.cpp:325
      ...
      

            Assignee:
            milkie@mongodb.com Eric Milkie
            Reporter:
            schwerin@mongodb.com Andy Schwerin
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: