Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38679

Race between PeriodicBalancerConfigRefresher::onStepDown() and mongod shutdown

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • ALL
    • Sharding 2019-01-28, Sharding 2019-02-11, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-04-08
    • 21

      This line tries to pause a PeriodicRunner task. If this happens after this line in shutdown which stops the PeriodicRunner, the following invariant trips:

      [js_test:auth] 2018-12-17T22:08:02.168-0500 d20027| 2018-12-17T17:08:02.168-0500 I ASIO     [Replication] Dropping all pooled connections to redbeard:20029 due to HostUnreachable: Error connecting to redbeard:20029 (127.0.0.1:20029) :: caused by :: Connection refused
      [js_test:auth] 2018-12-17T22:08:02.168-0500 d20027| 2018-12-17T17:08:02.168-0500 I REPL_HB  [replexec-0] Error in heartbeat (requestId: 984) to redbeard:20029, response status: HostUnreachable: Error connecting to redbeard:20029 (127.0.0.1:20029) :: caused by :: Connection refused
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 I REPL     [replexec-1] can't see a majority of the set, relinquishing primary
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 I REPL     [replexec-1] Stepping down from primary in response to heartbeat
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 I REPL     [replexec-1] transition to SECONDARY from PRIMARY
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 I NETWORK  [replexec-1] Skip closing connection for connection # 43
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 I SHARDING [replexec-1] The ChunkSplitter has stopped and will no longer run new autosplit tasks. Any autosplit tasks that have already started will be allowed to finish.
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 F -        [replexec-1] Invariant failure _execStatus == PeriodicJobImpl::ExecutionStatus::RUNNING src/mongo/util/periodic_runner_impl.cpp 143
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 I NETWORK  [conn1] end connection 127.0.0.1:47664 (1 connection now open)
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| 2018-12-17T17:08:02.180-0500 F -        [replexec-1]
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027|
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027| ***aborting after invariant() failure
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027|
      [js_test:auth] 2018-12-17T22:08:02.180-0500 d20027|
      [js_test:auth] 2018-12-17T22:08:02.181-0500 d20027| 2018-12-17T17:08:02.180-0500 F -        [replexec-1] Got signal: 6 (Aborted).
      [js_test:auth] 2018-12-17T22:08:02.181-0500 d20027|  0x7f93dcfda36a 0x7f93dcfd9c6e 0x7f93dcfd9d0f 0x7f93db02c3c0 0x7f93dae8dd7f 0x7f93dae78672 0x7f93dcf3616f 0x7f93df743740 0x7f93df744841 0x7f93e02702e4 0x7f93e01bfc71 0x7f93e01d3e23 0x7f93df1127b5 0x7f93df112eae 0x7f93df1ffbef 0x7f93df2003d0 0x7f93df200d65 0x7f93db113063 0x7f93db021a9d 0x7f93daf51b23
      [js_test:auth] 2018-12-17T22:08:02.181-0500 d20027| ----- BEGIN BACKTRACE -----
      
      SNIP
      
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libbase.so(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x3A) [0x7f93dcfda36a]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libbase.so(+0x176C6E) [0x7f93dcfd9c6e]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libbase.so(+0x176D0F) [0x7f93dcfd9d0f]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libpthread.so.0(+0x123C0) [0x7f93db02c3c0]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libc.so.6(gsignal+0x10F) [0x7f93dae8dd7f]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libc.so.6(abort+0x125) [0x7f93dae78672]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libbase.so(mongo::invariantFailedWithMsg(char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*, unsigned int)+0x0) [0x7f93dcf3616f]
      [js_test:auth] 2018-12-17T22:08:02.182-0500 d20027|  libperiodic_runner_impl.so(+0x4740) [0x7f93df743740]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libperiodic_runner_impl.so(mongo::PeriodicRunnerImpl::PeriodicJobHandleImpl::pause()+0x81) [0x7f93df744841]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libserveronly_repl.so(mongo::repl::ReplicationCoordinatorExternalStateImpl::shardingOnStepDownHook()+0xC4) [0x7f93e02702e4]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  librepl_coordinator_impl.so(mongo::repl::ReplicationCoordinatorImpl::_performPostMemberStateUpdateAction(mongo::repl::ReplicationCoordinatorImpl::PostMemberStateUpdateAction)+0x201) [0x7f93e01bfc71]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  librepl_coordinator_impl.so(mongo::repl::ReplicationCoordinatorImpl::_stepDownFinish(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::executor::TaskExecutor::EventHandle const&)+0x183) [0x7f93e01d3e23]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libthread_pool_task_executor.so(mongo::executor::ThreadPoolTaskExecutor::runCallback(std::shared_ptr<mongo::executor::ThreadPoolTaskExecutor::CallbackState>)+0x175) [0x7f93df1127b5]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libthread_pool_task_executor.so(+0xCEAE) [0x7f93df112eae]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libthread_pool.so(mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*)+0x15F) [0x7f93df1ffbef]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libthread_pool.so(mongo::ThreadPool::_consumeTasks()+0xA0) [0x7f93df2003d0]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libthread_pool.so(mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x95) [0x7f93df200d65]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libstdc++.so.6(+0xBC063) [0x7f93db113063]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libpthread.so.0(+0x7A9D) [0x7f93db021a9d]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027|  libc.so.6(clone+0x43) [0x7f93daf51b23]
      [js_test:auth] 2018-12-17T22:08:02.183-0500 d20027| -----  END BACKTRACE  -----[
      

      I think this was likely introduced by this commit.

            Assignee:
            ben.caimano@mongodb.com Benjamin Caimano (Inactive)
            Reporter:
            mathias@mongodb.com Mathias Stearn
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: