[SERVER-82360] Unhandled exception crash on 7.2.0-alpha-870-g14c180a when running createIndexes Created: 20/Oct/23  Updated: 29/Nov/23

Status: Needs Scheduling
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Shane Harvey Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: car-investigation
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2023-11-23-15-43-41-435.png     PNG File image-2023-11-23-15-46-44-549.png     PNG File image-2023-11-24-10-14-07-753.png     PNG File image-2023-11-24-11-41-41-560.png     File server-82360-repro.diff    
Issue Links:
Related
related to PYTHON-4006 Tests should avoid blocking indefinit... Backlog
Assigned Teams:
Cluster Scalability
Operating System: ALL
Sprint: CAR Team 2023-11-27
Participants:

 Description   

Unhandled exception crash on 7.2.0-alpha-870-g14c180a when running createIndexes:

 [2023/10/20 17:37:23.902] MongoDB server version: db version v7.2.0-alpha-870-g14c180a

The command the test suite is running:

    vault.create_index(
        "keyAltNames",
        unique=True,
        partialFilterExpression={"keyAltNames": {"$exists": True}},
    )

The logs of the crash:

{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF9630B49B9","module":"KERNELBASE.dll","s":"RaiseException","s+":"69"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF660594969","module":"mongod.exe","file":".../src/mongo/util/signal_handlers_synchronous.cpp","line":112,"s":"mongo::`anonymous namespace'::endProcessWithSignal","s+":"19"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66059583E","module":"mongod.exe","file":".../src/mongo/util/signal_handlers_synchronous.cpp","line":282,"s":"abruptQuit","s+":"AE"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF9634DD88D","module":"ucrtbase.dll","s":"raise","s+":"1DD"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF9634DE761","module":"ucrtbase.dll","s":"abort","s+":"31"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66059CEA6","module":"mongod.exe","file":".../src/mongo/util/assert_util.cpp","line":71,"s":"mongo::`anonymous namespace'::callAbort","s+":"16"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66059DBDF","module":"mongod.exe","file":".../src/mongo/util/assert_util.cpp","line":145,"s":"mongo::invariantFailedWithMsg","s+":"16F"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F1A0DCA","module":"mongod.exe","file":".../src/mongo/s/transaction_router.cpp","line":534,"s":"mongo::TransactionRouter::Router::processParticipantResponse","s+":"9AA"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F1AAF01","module":"mongod.exe","file":".../src/mongo/s/multi_statement_transaction_requests_sender.cpp","line":120,"s":"mongo::MultiStatementTransactionRequestsSender::next","s+":"61"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F1B0BF0","module":"mongod.exe","file":".../src/mongo/s/cluster_commands_helpers.cpp","line":268,"s":"mongo::`anonymous namespace'::gatherResponsesImpl","s+":"120"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F1B0ABB","module":"mongod.exe","file":".../src/mongo/s/cluster_commands_helpers.cpp","line":325,"s":"mongo::gatherResponses","s+":"2B"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F19F847","module":"mongod.exe","file":".../src/mongo/s/transaction_router.cpp","line":1482,"s":"mongo::TransactionRouter::Router::implicitlyAbortTransaction","s+":"747"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC7543E","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":361,"s":"mongo::`anonymous namespace'::ExecCommandClient::_epilogue","s+":"1CE"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6D73A","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":315,"s":"mongo::future_details::statusCall<`mongo::`anonymous namespace'::ExecCommandClient::run'::`2'::<lambda_2> &,mongo::future_details::FakeVoid>","s+":"1A"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6B5FC","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":1287,"s":"mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::generalImpl<`mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,`mongo::`anonymous namespace'::ExecCommandClient::run'::`2'::<lambda_2> >'::`5'::<lambda_1>,`mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,`mongo::`anonymous namespace'::ExecCommandClient::run'::`2'::<lambda_2> >'::`5'::<lambda_2>,`mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,`mongo::`anonymous namespace'::ExecCommandClient::run'::`2'::<lambda_2> >'::`5'::<lambda_3> >","s+":"3C"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC70928","module":"mongod.exe","file":".../src/mongo/util/future_util.h","line":858,"s":"`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ExecCommandClient>::thenWithState<`mongo::`anonymous namespace'::ParseAndRunCommand::RunAndRetry::_run'::`2'::<lambda_1> >'::`2'::<lambda_1>::operator()","s+":"88"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6CCFB","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ExecCommandClient>::thenWithState<`mongo::`anonymous namespace'::ParseAndRunCommand::RunAndRetry::_run'::`2'::<lambda_1> >'::`2'::<lambda_1> >","s+":"1B"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC78D38","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":964,"s":"mongo::`anonymous namespace'::ParseAndRunCommand::RunAndRetry::_run","s+":"88"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6D36E","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::`anonymous namespace'::ParseAndRunCommand::RunAndRetry::run'::`2'::<lambda_1> >","s+":"2E"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC7C05C","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":1125,"s":"mongo::`anonymous namespace'::ParseAndRunCommand::RunAndRetry::run","s+":"5C"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6CF17","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ParseAndRunCommand::RunAndRetry>::thenWithState<``mongo::`anonymous namespace'::ParseAndRunCommand::RunInvocation::run'::`2'::<lambda_1>::operator()'::`2'::<lambda_2> >'::`2'::<lambda_1> >","s+":"47"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6D42E","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::`anonymous namespace'::ParseAndRunCommand::RunInvocation::run'::`2'::<lambda_1> >","s+":"9E"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC70601","module":"mongod.exe","file":".../src/mongo/util/future_util.h","line":858,"s":"`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ParseAndRunCommand::RunInvocation>::thenWithState<``mongo::`anonymous namespace'::ParseAndRunCommand::run'::`2'::<lambda_1>::operator()'::`2'::<lambda_1> >'::`2'::<lambda_1>::operator()","s+":"51"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6CAEB","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ParseAndRunCommand::RunInvocation>::thenWithState<``mongo::`anonymous namespace'::ParseAndRunCommand::run'::`2'::<lambda_1>::operator()'::`2'::<lambda_1> >'::`2'::<lambda_1> >","s+":"1B"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6D2C9","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::`anonymous namespace'::ParseAndRunCommand::run'::`2'::<lambda_1> >","s+":"89"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC7BDD0","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":1168,"s":"mongo::`anonymous namespace'::ParseAndRunCommand::run","s+":"50"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6CB47","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ParseAndRunCommand>::thenWithState<`mongo::`anonymous namespace'::ClientCommand::_execute'::`2'::<lambda_1> >'::`2'::<lambda_1> >","s+":"47"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC757F2","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":1241,"s":"mongo::`anonymous namespace'::ClientCommand::_execute","s+":"272"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6D0D8","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::`anonymous namespace'::ClientCommand::run'::`2'::<lambda_1> >","s+":"28"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC7BC18","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":1321,"s":"mongo::`anonymous namespace'::ClientCommand::run","s+":"48"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC6CD55","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1354,"s":"mongo::makeReadyFutureWith<`mongo::future_util::AsyncState<mongo::`anonymous namespace'::ClientCommand>::thenWithState<`mongo::Strategy::clientCommand'::`2'::<lambda_1> >'::`2'::<lambda_1> >","s+":"45"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC7B37D","module":"mongod.exe","file":".../src/mongo/s/commands/strategy.cpp","line":1332,"s":"mongo::Strategy::clientCommand","s+":"ED"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC5ADBC","module":"mongod.exe","file":".../src/mongo/s/service_entry_point_mongos.cpp","line":159,"s":"mongo::HandleRequest::handleRequest","s+":"1BC"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC5A3ED","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":1019,"s":"``mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,`mongo::HandleRequest::run'::`2'::<lambda_2> >'::`7'::<lambda_3>::operator()'::`2'::<lambda_1>::operator()","s+":"5D"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E068FFA","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":482,"s":"mongo::future_details::SharedStateBase::transitionToFinished","s+":"3A"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E09BD9F","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":639,"s":"mongo::future_details::SharedStateImpl<mongo::future_details::FakeVoid>::setFrom","s+":"1F"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC5A9FC","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":254,"s":"`mongo::unique_function<void __cdecl(mongo::future_details::SharedStateBase *)>::makeImpl<`mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::makeContinuation<void,``mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,`mongo::HandleRequest::run'::`2'::<lambda_1> >'::`5'::<lambda_3>::operator()'::`2'::<lambda_1> >'::`2'::<lambda_1> >'::`2'::SpecificImpl::call","s+":"5C"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E068FFA","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":482,"s":"mongo::future_details::SharedStateBase::transitionToFinished","s+":"3A"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC5B91B","module":"mongod.exe","file":".../src/mongo/s/service_entry_point_mongos.cpp","line":212,"s":"mongo::HandleRequest::run","s+":"50B"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC5AF13","module":"mongod.exe","file":".../src/mongo/s/service_entry_point_mongos.cpp","line":219,"s":"mongo::ServiceEntryPointMongos::handleRequestImpl","s+":"33"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65EC5828A","module":"mongod.exe","file":".../src/mongo/db/cluster_transaction_api.cpp","line":79,"s":"mongo::txn_api::details::ClusterSEPTransactionClientBehaviors::handleRequest","s+":"1A"}}}
{"t":{"$date":"2023-10-20T18:42:57.538+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65ED2B8B4","module":"mongod.exe","file":".../src/mongo/db/transaction/transaction_api.cpp","line":485,"s":"mongo::txn_api::details::SEPTransactionClient::_runCommand","s+":"464"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65ED32A2C","module":"mongod.exe","file":".../src/mongo/db/transaction/transaction_api.cpp","line":509,"s":"mongo::txn_api::details::SEPTransactionClient::runCommand","s+":"5C"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65ED2D5AA","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":254,"s":"`mongo::unique_function<mongo::SemiFuture<mongo::BSONObj> __cdecl(void)>::makeImpl<`mongo::txn_api::details::Transaction::_commitOrAbort'::`2'::<lambda_2> >'::`2'::SpecificImpl::call","s+":"4A"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E850FA4","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":1000,"s":"`mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,```mongo::ExecutorFuture<void>::_wrapCBHelper<mongo::unique_function<mongo::SemiFuture<mongo::BSONObj> __cdecl(void)> >'::`2'::<lambda_1>::operator()<>'::`2'::<lambda_1>::operator()'::`2'::<lambda_1> >'::`7'::<lambda_1>::operator()","s+":"44"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E84BE6F","module":"mongod.exe","file":".../src/mongo/util/future.h","line":1381,"s":"``mongo::ExecutorFuture<void>::_wrapCBHelper<mongo::unique_function<mongo::SemiFuture<mongo::BSONObj> __cdecl(void)> >'::`2'::<lambda_1>::operator()<>'::`2'::<lambda_1>::operator()","s+":"AF"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E85D501","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":254,"s":"`mongo::unique_function<void __cdecl(mongo::Status)>::makeImpl<``mongo::ExecutorFuture<void>::_wrapCBHelper<mongo::unique_function<mongo::SemiFuture<mongo::BSONObj> __cdecl(void)> >'::`2'::<lambda_1>::operator()<>'::`2'::<lambda_1> >'::`2'::SpecificImpl::call","s+":"21"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65ED35E82","module":"mongod.exe","file":".../src/mongo/executor/inline_executor.cpp","line":90,"s":"mongo::executor::InlineExecutor::run","s+":"102"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65ED2735A","module":"mongod.exe","file":".../src/mongo/db/transaction/transaction_api.cpp","line":176,"s":"`mongo::txn_api::SyncTransactionWithRetries::runNoThrow'::`8'::<lambda_2>::operator()","s+":"1EA"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65ED2D742","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":254,"s":"`mongo::unique_function<mongo::Future<void> __cdecl(void)>::makeImpl<`mongo::txn_api::SyncTransactionWithRetries::runNoThrow'::`8'::<lambda_2> >'::`2'::SpecificImpl::call","s+":"12"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E3B7616","module":"mongod.exe","file":".../src/mongo/util/future_impl.h","line":1000,"s":"`mongo::future_details::FutureImpl<mongo::future_details::FakeVoid>::then<mongo::CleanupFuturePolicy<0>,```mongo::ExecutorFuture<void>::_wrapCBHelper<mongo::unique_function<mongo::SemiFuture<void> __cdecl(void)> >'::`2'::<lambda_1>::operator()<>'::`2'::<lambda_1>::operator()'::`2'::<lambda_1> >'::`7'::<lambda_1>::operator()","s+":"36"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65E3C499F","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":262,"s":"`mongo::unique_function<void __cdecl(mongo::Status)>::makeImpl<``mongo::ExecutorFuture<void>::_wrapCBHelper<mongo::unique_function<mongo::SemiFuture<void> __cdecl(void)> >'::`2'::<lambda_1>::operator()<>'::`2'::<lambda_1> >'::`2'::SpecificImpl::call","s+":"7F"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF6601308DB","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":262,"s":"`mongo::unique_function<void __cdecl(mongo::executor::TaskExecutor::CallbackArgs const &)>::makeImpl<`mongo::executor::TaskExecutor::schedule'::`2'::<lambda_1> >'::`2'::SpecificImpl::call","s+":"3B"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F480BFC","module":"mongod.exe","file":".../src/mongo/executor/scoped_task_executor.cpp","line":310,"s":"``mongo::executor::ScopedTaskExecutor::Impl::_wrapCallback<`mongo::executor::ScopedTaskExecutor::Impl::scheduleWorkAt'::`2'::<lambda_1>,mongo::unique_function<void __cdecl(mongo::executor::TaskExecutor::CallbackArgs const &)> >'::`2'::<lambda_1>::operator()<mongo::executor::TaskExecutor::CallbackArgs>'::`2'::<lambda_1>::operator()","s+":"4C"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF65F47E81D","module":"mongod.exe","file":".../src/mongo/executor/scoped_task_executor.cpp","line":318,"s":"`mongo::executor::ScopedTaskExecutor::Impl::_wrapCallback<`mongo::executor::ScopedTaskExecutor::Impl::onEvent'::`2'::<lambda_1>,mongo::unique_function<void __cdecl(mongo::executor::TaskExecutor::CallbackArgs const &)> >'::`2'::<lambda_1>::operator()<mongo::executor::TaskExecutor::CallbackArgs>","s+":"6D"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF660095E67","module":"mongod.exe","file":".../src/mongo/executor/thread_pool_task_executor.cpp","line":675,"s":"mongo::executor::ThreadPoolTaskExecutor::runCallback","s+":"197"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF660094BDB","module":"mongod.exe","file":".../src/mongo/util/functional.h","line":262,"s":"`mongo::unique_function<void __cdecl(mongo::Status)>::makeImpl<`mongo::executor::ThreadPoolTaskExecutor::scheduleIntoPool_inlock'::`16'::<lambda_3> >'::`2'::SpecificImpl::call","s+":"9B"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66045BE02","module":"mongod.exe","file":".../src/mongo/util/concurrency/thread_pool.cpp","line":522,"s":"mongo::ThreadPool::Impl::_doOneTask","s+":"212"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66045B2F7","module":"mongod.exe","file":".../src/mongo/util/concurrency/thread_pool.cpp","line":421,"s":"mongo::ThreadPool::Impl::_consumeTasks","s+":"A7"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66045CDB9","module":"mongod.exe","file":".../src/mongo/util/concurrency/thread_pool.cpp","line":409,"s":"mongo::ThreadPool::Impl::_workerThreadBody","s+":"1A9"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF66045A2C4","module":"mongod.exe","file":"C:/Program Files/Microsoft Visual Studio/2022/Professional/VC/Tools/MSVC/14.31.31103/include/thread","line":56,"s":"std::thread::_Invoke<std::tuple<`mongo::stdx::thread::thread<`mongo::ThreadPool::Impl::_startWorkerThread_inlock'::`23'::<lambda_1>,0>'::`1'::<lambda_7_> >,0>","s+":"34"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF96349268A","module":"ucrtbase.dll","s":"o_exp","s+":"5A"}}}
{"t":{"$date":"2023-10-20T18:42:57.539+01:00"},"s":"I",  "c":"CONTROL",  "id":31445,   "ctx":"ShardingDDLCoordinator-0","msg":"Frame","attr":{"frame":{"a":"7FF964977AB4","module":"KERNEL32.DLL","s":"BaseThreadInitThunk","s+":"14"}}}
{"t":{"$date":"2023-10-20T18:42:57.540+01:00"},"s":"I",  "c":"CONTROL",  "id":23132,   "ctx":"ShardingDDLCoordinator-0","msg":"Writing minidump diagnostic file","attr":{"dumpName":"C:\\data\\mci\\b407decbc9c4e25dd76872f64c891dc6\\drivers-tools\\mongodb\\bin\\mongod.2023-10-20T17-42-57.mdmp"}}

Happened once in the pymongo test suite on Windows: https://spruce.mongodb.com/task/mongo_python_driver_tests_windows_encryption__platform~windows_64_vsMulti_small_auth_ssl~noauth_nossl_python_version_windows~3.10_encryption~encryption_test_latest_sharded_cluster_61269c0f892a14c0e86c32d94dc66a7ff0d831b2_23_10_19_18_46_21/files?execution=0&sortBy=STATUS&sortDir=ASC

Note that the Evergreen task includes the core dump (.mdmp) of the process and the full mongod logs.



 Comments   
Comment by Yujin Kang Park [ 29/Nov/23 ]

Given the transactions infrastructure is owned by Cluster scalability, I am sending it over so they can better determine the exact cause and the proper fix.

Comment by Yujin Kang Park [ 29/Nov/23 ]

The transaction is supposed to schedule clean up on another thread and wait for the clean up to complete by making a blocking call on the future. But the opCtx is interrupted, and getNoThrow returns immediately. Making the call non-interruptible seems to make the crash go away.

It is not clear on which data structure the race is. Scheduled jobs capture shared_ptr objects, and the opCtx going out of scope in the original txn thread and the opCtx being used right before we invariant seem to be different.

Comment by Yujin Kang Park [ 29/Nov/23 ]

Attempted to reproduce by adding a failpoint to hang in ClusterSEPTransactionClientBehaviors::handleRequest and issuing killAllSessionsByPattern. Did not work. From what I understand this is where the "delete: chunks" is sent to the config server.

Managed to reproduce the issue by hanging the configSvr while executing "delete: chunks" command, and issuing killAllSessionsByPattern on the shard. Attaching repro server-82360-repro.diff

Events:

  1. the create collection coordinator v3 (legacy due to fcv 7.0) is attempting to commit.
  2. As part of the commit, it runs "delete: chunks" on the config database in the config server.
  3. killAllSessionsByPattern comes in and kills the transaction session and opCtx.
  4. Chose internal transaction error handling step","attr":{"nextStep":"abort and do not retry"
  5. "Aborting transaction on all participant shards"
  6. Re-executing sharding DDL coordinator
  7. invariant is hit

Note: re-executing coordinator happens on the same thread, while the previous transaction is concurrently aborted on another thread.
 

Comment by Shane Harvey [ 28/Nov/23 ]

The pymongo test suite runs killAllSessions on both mongoses before+after each unified spec test. Now TestExternalKeyVault::test_external_key_vault_1 is not a unified test but some of the recent tests beforehand are (all the "TestUnified..." tests):

 [2023/10/20 10:44:54.777] test/test_encryption.py::TestUnifiedRewrapManyDataKey::test_rewrap_with_new_KMIP_KMS_provider PASSED [ 82%]
 [2023/10/20 10:44:55.270] test/test_encryption.py::TestUnifiedRewrapManyDataKey::test_rewrap_with_new_local_KMS_provider PASSED [ 83%]
 [2023/10/20 10:44:55.643] test/test_encryption.py::TestDataKeyDoubleEncryption::test_data_key_aws PASSED [ 83%]
 [2023/10/20 10:44:55.972] test/test_encryption.py::TestDataKeyDoubleEncryption::test_data_key_azure PASSED [ 83%]
 [2023/10/20 10:44:56.498] test/test_encryption.py::TestDataKeyDoubleEncryption::test_data_key_gcp PASSED [ 83%]
 [2023/10/20 10:44:56.650] test/test_encryption.py::TestDataKeyDoubleEncryption::test_data_key_kmip PASSED [ 83%]
 [2023/10/20 10:44:56.782] test/test_encryption.py::TestDataKeyDoubleEncryption::test_data_key_local PASSED [ 84%]
 [2023/10/20 11:09:56.782] Timeout (0:25:00)!

Comment by Yujin Kang Park [ 24/Nov/23 ]

Things that have yet to be determined:

  • What is killing the sessions?
  • Why is the config.system.sessions collection being created so late in the test (like 4 minutes in)?
  • The failed remote request has both an LSID (with txnNumber: 3541) and a "txnNumber: 0" field, is this normal?
Comment by Yujin Kang Park [ 24/Nov/23 ]

Apparently, the above mentioned TransactionRouter::Router in implicitAbortTransaction is also invalid. Thanks to marcos.grillo@mongodb.com we were able to find the txnNumber (3541), in the stack frame for SyncTransactionWithRetries::runNoThrow. By searching for this transaction in the logs, we discovered that the session was killed by the "SessionKiller", triggered by the killAllSessionsByPattern command.

{"t":{"$date":"2023-10-20T18:42:56.837+01:00"},"s":"I",  "c":"WRITE",    "id":51803,   "ctx":"ShardingDDLCoordinator-5","msg":"Slow query","attr":{"type":"update","ns":"config.system.sharding_ddl_coordinators","command":{"q":{"_id":{"namespace":"config.system.sessions","operationType":"createCollection_V3"}},"u":{"_id":{"namespace":"config.system.sessions","operationType":"createCollection_V3"},"recoveredFromDisk":true,"forwardableOpMetadata":{"mayBypassWriteBlocking":false},"session":{"lsid":{"id":{"$uuid":"ddd3a368-c63b-467c-969d-761c4af3c120"},"uid":{"$binary":{"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"}}},"txnNumber":3541},"shardKey":{"_id":1},"unique":false,"numInitialChunks":0,"presplitHashedZones":false,"capped":false,"phase":"commit","translatedRequestParams":{"nss":"config.system.sessions","keyPattern":{"_id":1},"collation":{}}},"multi":false,"upsert":false},"planSummary":"IDHACK","totalOplogSlotDurationMicros":36,"keysExamined":1,"docsExamined":1,"nMatched":1,"nModified":1,"nUpserted":0,"keysInserted":0,"keysDeleted":0,"numYields":0,"locks":{"FeatureCompatibilityVersion":{"acquireCount":{"w":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":4}}},"flowControl":{"acquireCount":9},"storage":{},"durationMillis":0}}
...
{"t":{"$date":"2023-10-20T18:42:56.858+01:00"},"s":"I",  "c":"COMMAND",  "id":20706,   "ctx":"SessionKiller","msg":"Killing op as part of killing session","attr":{"opId":181060,"lsid":{"id":{"$uuid":"ddd3a368-c63b-467c-969d-761c4af3c120"},"uid":{"$binary":{"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"}},"txnNumber":3541,"txnUUID":{"$uuid":"e768c342-67bf-4e14-80b1-0b96d46c8091"}}}}
 
{"t":{"$date":"2023-10-20T18:42:56.858+01:00"},"s":"I",  "c":"COMMAND",  "id":20706,   "ctx":"SessionKiller","msg":"Killing op as part of killing session","attr":{"opId":181059,"lsid":{"id":{"$uuid":"ddd3a368-c63b-467c-969d-761c4af3c120"},"uid":{"$binary":{"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"}}}}}
 
{"t":{"$date":"2023-10-20T18:42:56.859+01:00"},"s":"I",  "c":"COMMAND",  "id":558701,  "ctx":"conn75","msg":"Success: kill session","attr":{"session":{"remote":"[::1]:54350","local":"[::1]:27217"},"metadata":{"driver":{"name":"NetworkInterfaceTL-TaskExecutorPool-0","version":"7.2.0-alpha-870-g14c180a"},"os":{"type":"Windows","name":"Microsoft Windows Server 2019","architecture":"x86_64","version":"10.0 (build 17763)"}},"command":{"killAllSessionsByPattern":[{"users":[],"roles":[]}],"maxTimeMSOpOnly":60000,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1697823776,"i":41}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$configTime":{"$timestamp":{"t":1697823776,"i":31}},"$topologyTime":{"$timestamp":{"t":1697823496,"i":2}},"mayBypassWriteBlocking":false,"$db":"admin"}}}
 
{"t":{"$date":"2023-10-20T18:42:56.859+01:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn75","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"killAllSessionsByPattern":[{"users":[],"roles":[]}],"maxTimeMSOpOnly":60000,"$clusterTime":{"clusterTime":{"$timestamp":{"t":1697823776,"i":41}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$configTime":{"$timestamp":{"t":1697823776,"i":31}},"$topologyTime":{"$timestamp":{"t":1697823496,"i":2}},"mayBypassWriteBlocking":false,"$db":"admin"},"numYields":0,"reslen":207,"locks":{},"remote":"[::1]:54350","protocol":"op_msg","durationMillis":0}}

Which led to the following failure logs:

{"t":{"$date":"2023-10-20T18:42:56.861+01:00"},"s":"I",  "c":"TXN",      "id":5918600, "ctx":"ShardingDDLCoordinator-5","msg":"Chose internal transaction error handling step","attr":{"nextStep":"abort and do not retry","txnInfo":{"execContext":"client retryable write","sessionInfo":{"lsid":{"id":{"$uuid":"ddd3a368-c63b-467c-969d-761c4af3c120"},"uid":{"$binary":{"base64":"u4nTF1+wmByGgmwndZCCo3FgRx9gUEtGEkFRhsYwq3A=","subType":"0"}},"txnNumber":3541,"txnUUID":{"$uuid":"e768c342-67bf-4e14-80b1-0b96d46c8091"}},"txnNumber":0,"autocommit":false},"state":"started","lastOperationTime":"{ ts: Timestamp(1697823776, 41) }","latestResponseHasTransientTransactionErrorLabel":false,"deadline":"none","writeConcern":{"w":"majority","wtimeout":0},"readConcern":{},"APIParameters":{},"canceled":true},"attempts":1,"error":"Interrupted: Write results unavailable from localhost:1026 :: caused by :: operation was interrupted","errorHandler":"runBody"}}
 
{"t":{"$date":"2023-10-20T18:42:56.861+01:00"},"s":"D1", "c":"ASSERT",   "id":23074,   "ctx":"ShardingDDLCoordinator-5","msg":"User assertion","attr":{"error":"Interrupted: operation was interrupted","file":"C:\\data\\mci\\a1662da027386ec808a01d16b32c258c\\src\\src\\mongo/db/transaction/transaction_api.h","line":244}}
 
{"t":{"$date":"2023-10-20T18:42:56.861+01:00"},"s":"I",  "c":"SHARDING", "id":5277908, "ctx":"ShardingDDLCoordinator-5","msg":"Failed to obtain collection's placement version, so it will be recovered","attr":{"namespace":"config.system.sessions","error":"Interrupted: operation was interrupted"}}
 
{"t":{"$date":"2023-10-20T18:42:56.861+01:00"},"s":"D1", "c":"SHARDING", "id":4798530, "ctx":"ShardingDDLCoordinator-5","msg":"Clearing collection metadata","attr":{"namespace":"config.system.sessions","collIsDropped":false}}
{"t":{"$date":"2023-10-20T18:42:56.869+01:00"},"s":"D1", "c":"SHARDING", "id":22871,   "ctx":"TaskExecutorPool-0","msg":"Error processing the remote request, not updating operationTime or gLE","attr":{"error":{"code":90,"codeName":"CallbackCanceled","errmsg":"Command canceled; original request was: RemoteCommand 9967 -- target:[localhost:1026] db:config cmd:{ delete: \"chunks\", bypassDocumentValidation: false, ordered: true, stmtIds: [ 0 ], deletes: [ { q: { uuid: UUID(\"b8faef73-e8ef-4b5c-a5f9-84d313eeef8e\") }, limit: 1 } ], lsid: { id: UUID(\"ddd3a368-c63b-467c-969d-761c4af3c120\"), uid: BinData(0, BB89D3175FB0981C86826C27759082A37160471F60504B4612415186C630AB70), txnNumber: 3541, txnUUID: UUID(\"e768c342-67bf-4e14-80b1-0b96d46c8091\") }, txnNumber: 0, readConcern: { afterClusterTime: Timestamp(1697823776, 41) }, startTransaction: true, coordinator: true, autocommit: false }"}}}

It then goes on to re-execute the DDL operation, and subsequently crashes:

{"t":{"$date":"2023-10-20T18:42:56.883+01:00"},"s":"I",  "c":"SHARDING", "id":5656000, "ctx":"ShardingDDLCoordinator-2","msg":"Re-executing sharding DDL coordinator","attr":{"coordinatorId":{"namespace":"config.system.sessions","operationType":"createCollection_V3"},"reason":"Interrupted: operation was interrupted"}}

Generated at Thu Feb 08 06:49:00 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.