The wait on the future from ReshardingCoordinator::_canEnterCritical becoming ready should be subject to the abortToken. This ensures the ReshardingCoordinator instance can be reliably interrupted by shutdown, stepdown, and the abortReshardCollection command.
LOGV2(5391602, "Resharding operation waiting for an okay to enter critical section"); return _canEnterCritical.getFuture().thenRunOn(**executor).then([this] { _commitMonitorCancellationSource.cancel(); LOGV2(5391603, "Resharding operation is okay to enter critical section"); });
The ReshardingCoordinator never makes it into the "applying" state and ShardedClusterFixture:job0:configsvr:node0 never finishes stepping up because of the ReshardingCoordinator instance still waiting on the future from ReshardingCoordinator::_canEnterCritical becoming ready.
[j0:c:n0] | 2021-10-20T12:50:18.615+00:00 I RESHARD 5343001 [ReshardingCoordinatorService-2] "Transitioned resharding coordinator state","attr":{"newState":"applying","oldState":"cloning","namespace":"test2_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"0424b1d8-0f0f-4398-8351-6e5f2af1a8c6"}},"reshardingUUID":{"uuid":{"$uuid":"5a9b5a66-2a4d-4c36-b678-c98990dfec67"}}} [j0:c:n0] | 2021-10-20T12:50:18.680+00:00 I RESHARD 5391602 [ReshardingCoordinatorService-2] "Resharding operation waiting for an okay to enter critical section" ... [j0:c:n2] | 2021-10-20T12:50:32.453+00:00 I RESHARD 5343001 [ReshardingCoordinatorService-0] "Transitioned resharding coordinator state","attr":{"newState":"cloning","oldState":"preparing-to-donate","namespace":"test2_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"3b6c95e8-02e9-4c5e-a0ff-dae27dbf1fb4"}},"reshardingUUID":{"uuid":{"$uuid":"ed5a3c6f-6150-4400-a942-d1160f19d810"}}} ... [j0:c:n2] | 2021-10-20T12:50:37.225+00:00 I REPL 21358 [ReplCoord-8] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"} [j0:c:n0] | 2021-10-20T12:50:37.226+00:00 I REPL 21358 [ReplCoord-2] "Replica set state transition","attr":{"newState":"PRIMARY","oldState":"SECONDARY"} ... [j0:s1:n2] | 2021-10-20T12:50:38.073+00:00 I RESHARD 5279506 [ReshardingRecipientService-6] "Transitioned resharding recipient state","attr":{"newState":"applying","oldState":"cloning","namespace":"test2_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"3b6c95e8-02e9-4c5e-a0ff-dae27dbf1fb4"}},"reshardingUUID":{"uuid":{"$uuid":"ed5a3c6f-6150-4400-a942-d1160f19d810"}}} [j0:s0:n1] | 2021-10-20T12:50:38.130+00:00 I RESHARD 5279506 [ReshardingRecipientService-3] "Transitioned resharding recipient state","attr":{"newState":"applying","oldState":"cloning","namespace":"test2_fsmdb0.fsmcoll0","collectionUUID":{"uuid":{"$uuid":"3b6c95e8-02e9-4c5e-a0ff-dae27dbf1fb4"}},"reshardingUUID":{"uuid":{"$uuid":"ed5a3c6f-6150-4400-a942-d1160f19d810"}}}
[fsm_workload_test:reshard_collection_crud_ops] sh5755| Thread 45: "OplogApplier-0" (Thread 0x7f5e269a2700 (LWP 24118)) [fsm_workload_test:reshard_collection_crud_ops] sh5755| #0 0x00007f5e4e9ef48c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #1 0x00007f5e50950c7c in __gthread_cond_wait (__mutex=<optimized out>, __cond=__cond@entry=0x55ae184d4750) at /data/mci/be35cd837a5670147b811bee03592d2f/toolchain-builder/tmp/build-gcc-v3.sh-1f7/build/x86_64-mongodb-linux/libstdc++-v3/include/x86_64-mongodb-linux/bits/gthr-default.h:864 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #2 std::condition_variable::wait (this=this@entry=0x55ae184d4750, __lock=...) at ../../../../../src/combined/libstdc++-v3/src/c++11/condition_variable.cc:53 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #3 0x00007f5e4e0d34c2 in std::_V2::condition_variable_any::wait<mongo::BasicLockableAdapter> (__lock=..., this=0x55ae184d4750) at /opt/mongodbtoolchain/revisions/4ac427ffa2fb12ffce7028023dae1775a06e9bf5/stow/gcc-v3.7vm/include/c++/8.5.0/bits/std_mutex.h:238 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #4 mongo::Interruptible::NotInterruptible::waitForConditionOrInterruptNoAssertUntil (deadline=..., m=..., cv=..., this=<optimized out>) at src/mongo/util/interruptible.h:571 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #5 mongo::Interruptible::waitForConditionOrInterruptUntil<std::unique_lock<std::mutex>, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}>(mongo::stdx::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1})::{lambda(auto:1&, mongo::Interruptible::WakeSpeed)#3}::operator() (this=0x7f5e269a2560, deadline=..., speed=mongo::Interruptible::WakeSpeed::kSlow) at src/mongo/util/interruptible.h:385 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #6 0x00007f5e4d020697 in mongo::Interruptible::waitForConditionOrInterruptUntil<std::unique_lock<std::mutex>, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}>(mongo::stdx::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1})::{lambda(auto:1&, mongo::Interruptible::WakeSpeed)#4}::operator() (speed=mongo::Interruptible::WakeSpeed::kSlow, deadline=..., this=<optimized out>) at /opt/mongodbtoolchain/revisions/4ac427ffa2fb12ffce7028023dae1775a06e9bf5/stow/gcc-v3.7vm/include/c++/8.5.0/atomic:246 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #7 mongo::Interruptible::waitForConditionOrInterruptUntil<std::unique_lock<std::mutex>, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}>(mongo::stdx::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}) (pred=..., finalDeadline=..., m=..., cv=..., this=<optimized out>) at src/mongo/util/interruptible.h:443 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #8 mongo::Interruptible::waitForConditionOrInterrupt<std::unique_lock<std::mutex>, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}>(mongo::stdx::condition_variable&, std::unique_lock<std::mutex>&, mongo::future_details::SharedStateBase::wait(mongo::Interruptible*)::{lambda()#1}) (pred=..., m=..., cv=..., this=<optimized out>) at src/mongo/util/interruptible.h:458 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #9 mongo::future_details::SharedStateBase::wait (interruptible=<optimized out>, this=<optimized out>) at src/mongo/util/future_impl.h:402 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #10 mongo::future_details::SharedStateHolder<mongo::future_details::FakeVoid>::wait (interruptible=<optimized out>, this=<optimized out>) at src/mongo/util/future_impl.h:658 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #11 mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::wait (interruptible=<optimized out>, this=<optimized out>) at src/mongo/util/future_impl.h:830 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #12 mongo::SemiFuture<void>::wait (interruptible=<optimized out>, this=<optimized out>) at src/mongo/util/future.h:189 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #13 mongo::repl::PrimaryOnlyService::ActiveInstance::waitForCompletion (this=<optimized out>) at src/mongo/db/repl/primary_only_service.h:359 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #14 mongo::repl::PrimaryOnlyService::onStepUp (this=0x55ae130bb780, stepUpOpTime=...) at src/mongo/db/repl/primary_only_service.cpp:361 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #15 0x00007f5e4d021860 in mongo::repl::PrimaryOnlyServiceRegistry::onStepUpComplete (this=<optimized out>, opCtx=<optimized out>, term=<optimized out>) at /opt/mongodbtoolchain/revisions/4ac427ffa2fb12ffce7028023dae1775a06e9bf5/stow/gcc-v3.7vm/include/c++/8.5.0/bits/unique_ptr.h:345 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #16 0x00007f5e4cff3e53 in mongo::ReplicaSetAwareServiceRegistry::<lambda(mongo::ReplicaSetAwareInterface*)>::operator() (__closure=<synthetic pointer>, __closure=<synthetic pointer>, service=<optimized out>) at src/mongo/db/repl/replica_set_aware_service.cpp:73 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #17 std::for_each<__gnu_cxx::__normal_iterator<mongo::ReplicaSetAwareInterface**, std::vector<mongo::ReplicaSetAwareInterface*> >, mongo::ReplicaSetAwareServiceRegistry::onStepUpComplete(mongo::OperationContext*, long long int)::<lambda(mongo::ReplicaSetAwareInterface*)> > (__f=..., __last=0x392f002a735c2925, __first=0x55ae13055fa0) at /opt/mongodbtoolchain/revisions/4ac427ffa2fb12ffce7028023dae1775a06e9bf5/stow/gcc-v3.7vm/include/c++/8.5.0/bits/stl_algo.h:3882 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #18 mongo::ReplicaSetAwareServiceRegistry::onStepUpComplete (this=<optimized out>, opCtx=0x55ae18266400, term=17) at src/mongo/db/repl/replica_set_aware_service.cpp:74 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #19 0x00007f5e4cd73181 in mongo::repl::ReplicationCoordinatorImpl::signalDrainComplete (this=0x55ae12e5e000, opCtx=0x55ae18266400, termWhenBufferIsEmpty=17) at src/mongo/db/repl/optime.h:85 [fsm_workload_test:reshard_collection_crud_ops] sh5755| #20 0x00007f5e4cc24d4c in mongo::repl::OplogApplierImpl::_run (this=0x55ae1878b380, oplogBuffer=<optimized out>) at src/mongo/db/repl/oplog_batcher.h:94
- is depended on by
-
SERVER-57686 We need test coverage that runs resharding in the face of elections
- Closed
- related to
-
SERVER-61950 ReshardingOplogFetcher waits on network request completing without interruption, potentially preventing shard step-up from ever completing
- Closed