Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16834

Secondary nodes can hang during shutdown if BGSync::_buffer is full

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.8.0-rc4
    • Fix Version/s: 2.8.0-rc5
    • Component/s: Replication
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      During shutdown, it is possible for the replication consumer threads to stop pulling items out of the BGSync::_buffer queue, while the produce thread (oplog tailer/bgsync thread) is blocked trying to insert an item into the same, fixed-sized queue.

      For example, in 2.8.0-rc5-pre-, we can see the following two stacks in a hung system. Thread 3 is stuck because nobody is draining the BGSync::_buffer, and thread 2 is stuck because thread 3 never makes progress and so never checks for shutdown.

      Thread 3 (Thread 0x7ed14c6f9700 (LWP 17201)):
      #0  0x0000003887c0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000bf4690 in void boost::condition_variable_any::wait<boost::unique_lock<boost::timed_mutex> >(boost::unique_lock<boost::timed_mutex>&) ()
          at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:137
      #2  0x0000000000bf82d3 in mongo::repl::BackgroundSync::produce(mongo::OperationContext*) () at src/mongo/util/queue.h:76
      #3  0x0000000000bf981e in mongo::repl::BackgroundSync::_producerThread() () at src/mongo/db/repl/bgsync.cpp:193
      ...

      Thread 2 (Thread 0x7ed12c747700 (LWP 17397)):
      #0  0x0000003887c0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
      #1  0x0000000000f9a8db in boost::thread::join() () at src/third_party/boost/boost/thread/pthread/condition_variable.hpp:56
      #2  0x0000000000c564a5 in mongo::repl::ReplicationCoordinatorExternalStateImpl::shutdown() () at src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:107
       
      #3  0x0000000000c5b1f3 in mongo::repl::ReplicationCoordinatorImpl::shutdown() () at src/mongo/db/repl/replication_coordinator_impl.cpp:371
      #4  0x0000000000aa429a in mongo::exitCleanly(mongo::ExitCode) () at src/mongo/db/instance.cpp:1101
      #5  0x00000000009cf75a in mongo::CmdShutdown::shutdownHelper() () at src/mongo/db/dbcommands_generic.cpp:325
      ...

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              milkie Eric Milkie
              Reporter:
              schwerin Andy Schwerin
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: