[SERVER-66750] Determine if lock acquisition can happen inside of the writeConflictRetry in write_ops_exec.cpp Created: 25/May/22  Updated: 27/Oct/23  Resolved: 03/Feb/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Fausto Leyva (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-65418 Release all resources before sleep in... Open
Assigned Teams:
Storage Execution
Sprint: Execution Team 2023-02-20, Execution Team 2023-02-06
Participants:

 Description   

Running

resmoke --suites=concurrency jstests/concurrency/fsm_workloads/CRUD_and_commands_with_createindexes.js

with this diff

diff --git a/src/mongo/db/concurrency/exception_util.h b/src/mongo/db/concurrency/exception_util.h
index 6cfe238ca7d..63243009bac 100644
--- a/src/mongo/db/concurrency/exception_util.h
+++ b/src/mongo/db/concurrency/exception_util.h
@@ -34,6 +34,7 @@
 #include "mongo/db/operation_context.h"
 #include "mongo/util/assert_util.h"
 #include "mongo/util/fail_point.h"
+#include "mongo/util/stacktrace.h"
 
 namespace mongo {
 
@@ -124,6 +125,10 @@ auto writeConflictRetry(OperationContext* opCtx, StringData opStr, StringData ns
     int attemptsTempUnavailable = 0;
     while (true) {
         try {
+            if (opCtx->lockState()->isLocked()) {
+                printStackTrace();
+            }
+
             return f();
         } catch (WriteConflictException const&) {
             CurOp::get(opCtx)->debug().additiveMetrics.incrementWriteConflicts(1);

Shows that this writeConflictRetry is run with locks acquired at some earlier point. This is problematic because if a WriteConflictException is thrown, the locks and ticket will be held in logWriteConflictAndBackoff, which sleeps.

 



 Comments   
Comment by Fausto Leyva (Inactive) [ 03/Feb/23 ]

Since the issue with holding onto resources before the call to logAndBackoff is handled in write_ops_exec logic, closing this ticket as "works as designed".

Still investigating whether or not to open another ticket to look into dumpInsertsFromBulk.

Comment by Fausto Leyva (Inactive) [ 02/Feb/23 ]

After taking a look, it looks like this is acceptable since we will release the collection lock [here]prior to the call to logAndBackoff. So while there is a chance we are holding the collection level IX lock when entering the WCR loop, if we get an exception, we will release the resources before sleeping. 

It's worth noting I did see the printStackTrace() call from IndexBuildsCoordinatorMongod within dumpInsertsFromBulk (specifically, the WCR call from BulkBuilder::commit()). Not sure if it's a concern though, as it might make sense in this scenario?  gregory.wlodarek@mongodb.com did you notice this in your original investigation? 
stacktrace:

_ZN5mongo15printStackTraceEv","C":"mongo::printStackTrace()","s+":"3A"}}}
_ZN5mongo18writeConflictRetryIZNS_17BulkBuilderCommonINS_27SortedDataIndexAccessMethod15BulkBuilderImplEE6commitEPNS_16OperationContextERKNS_13CollectionPtrEbiRKSt8functionIFNS_6StatusERKNS_9KeyString5ValueEEERK
_ZN5mongo17BulkBuilderCommonINS_27SortedDataIndexAccessMethod15BulkBuilderImplEE6commitEPNS_16OperationContextERKNS_13CollectionPtrEbiRKSt8functionIFNS_6StatusERKNS_9KeyString5ValueEEERKS9_IFSA_RKNS_8RecordIdEEE
_ZN5mongo15MultiIndexBlock19dumpInsertsFromBulkEPNS_16OperationContextERKNS_13CollectionPtrERKSt8functionIFNS_6StatusERKNS_8RecordIdEEE","C":"mongo::MultiIndexBlock::dumpInsertsFromBulk(mongo::OperationContext*,
_ZN5mongo15MultiIndexBlock19dumpInsertsFromBulkEPNS_16OperationContextERKNS_13CollectionPtrE","C":"mongo::MultiIndexBlock::dumpInsertsFromBulk(mongo::OperationContext*, mongo::CollectionPtr const&)","s+":"78"}}}
_ZN5mongo15MultiIndexBlock30insertAllDocumentsInCollectionEPNS_16OperationContextERKNS_13CollectionPtrERKN5boost8optionalINS_8RecordIdEEE","C":"mongo::MultiIndexBlock::insertAllDocumentsInCollection(mongo::Opera
ZN5mongo18IndexBuildsManager18startBuildingIndexEPNS_16OperationContextERKNS_13CollectionPtrERKNS_4UUIDERKN5boost8optionalINS_8RecordIdEEE","C":"mongo::IndexBuildsManager::startBuildingIndex(mongo::OperationCont
_ZN5mongo22IndexBuildsCoordinator43_scanCollectionAndInsertSortedKeysIntoIndexEPNS_16OperationContextESt10shared_ptrINS_19ReplIndexBuildStateEERKN5boost8optionalINS_8RecordIdEEE","C":"mongo::IndexBuildsCoordinat
_ZN5mongo22IndexBuildsCoordinator11_buildIndexEPNS_16OperationContextESt10shared_ptrINS_19ReplIndexBuildStateEERKNS0_17IndexBuildOptionsE","C":"mongo::IndexBuildsCoordinator::_buildIndex(mongo::OperationContext*
_ZN5mongo22IndexBuildsCoordinator19_runIndexBuildInnerEPNS_16OperationContextESt10shared_ptrINS_19ReplIndexBuildStateEERKNS0_17IndexBuildOptionsERKN5boost8optionalINS_15ResumeIndexInfoEEE","C":"mongo::IndexBuild
_ZZN5mongo22IndexBuildsCoordinator14_runIndexBuildEPNS_16OperationContextERKNS_4UUIDERKNS0_17IndexBuildOptionsERKN5boost8optionalINS_15ResumeIndexInfoEEEENK5$_108clEv","C":"mongo::IndexBuildsCoordinator::_runInd
_ZN5mongo22IndexBuildsCoordinator14_runIndexBuildEPNS_16OperationContextERKNS_4UUIDERKNS0_17IndexBuildOptionsERKN5boost8optionalINS_15ResumeIndexInfoEEE","C":"mongo::IndexBuildsCoordinator::_runIndexBuild(mongo:
ZZN5mongo28IndexBuildsCoordinatorMongod16_startIndexBuildEPNS_16OperationContextERKNS_12DatabaseNameERKNS_4UUIDERKSt6vectorINS_7BSONObjESaISA_EES8_NS_18IndexBuildProtocolENS_22IndexBuildsCoordinator17IndexBuildO
ZZN5mongo15unique_functionIFvNS_6StatusEEE8makeImplIZNS_28IndexBuildsCoordinatorMongod16_startIndexBuildEPNS_16OperationContextERKNS_12DatabaseNameERKNS_4UUIDERKSt6vectorINS_7BSONObjESaISF_EESD_NS_18IndexBuildPr
_ZNK5mongo15unique_functionIFvNS_6StatusEEEclES1_","C":"mongo::unique_function<void (mongo::Status)>::operator()(mongo::Status) const","s+":"6F"}}}
ZN5mongo10ThreadPool4Impl10_doOneTaskEPSt11unique_lockINS_12latch_detail5LatchEE","C":"mongo::ThreadPool::Impl::_doOneTask(std::unique_lock<mongo::latch_detail::Latch>*)","s+":"2B9"}}}
ZN5mongo10ThreadPool4Impl13_consumeTasksEv","C":"mongo::ThreadPool::Impl::_consumeTasks()","s+":"7C"}}}
ZN5mongo10ThreadPool4Impl17_workerThreadBodyERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE","C":"mongo::ThreadPool::Impl::_workerThreadBody(std::__cxx11::basic_string<char, std::char_traits<char>, std::
ZZN5mongo10ThreadPool4Impl25_startWorkerThread_inlockEvENK4$_21clEv","C":"mongo::ThreadPool::Impl::_startWorkerThread_inlock()::$_21::operator()() const","s+":"1C"}}}
ZSt13__invoke_implIvZN5mongo10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JEET_St14__invoke_otherOT0_DpOT1_","C":"void std::__invoke_impl<void, mongo::ThreadPool::Impl::_startWorkerThread_inlock()::$_21>(s
ZSt8__invokeIZN5mongo10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JEENSt15__invoke_resultIT_JDpT0_EE4typeEOS5_DpOS6_","C":"std::__invoke_result<mongo::ThreadPool::Impl::_startWorkerThread_inlock()::$_21>:
ZSt12__apply_implIZN5mongo10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21St5tupleIJEEJEEDcOT_OT0_St16integer_sequenceImJXspT1_EEE","C":"decltype(auto) std::__apply_impl<mongo::ThreadPool::Impl::_startWorker
ZSt5applyIZN5mongo10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21St5tupleIJEEEDcOT_OT0_","C":"decltype(auto) std::apply<mongo::ThreadPool::Impl::_startWorkerThread_inlock()::$_21, std::tuple<> >(mongo::Thre
ZZN5mongo4stdx6threadC1IZNS_10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JELi0EEET_DpOT0_ENUlvE_clEv","C":"mongo::stdx::thread::thread<mongo::ThreadPool::Impl::_startWorkerThread_inlock()::$_21, , 0>(mong
ZSt13__invoke_implIvZN5mongo4stdx6threadC1IZNS0_10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JELi0EEET_DpOT0_EUlvE_JEES7_St14__invoke_otherOT0_DpOT1_","C":"void std::__invoke_impl<void, mongo::stdx::threa
ZSt8__invokeIZN5mongo4stdx6threadC1IZNS0_10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JELi0EEET_DpOT0_EUlvE_JEENSt15__invoke_resultIS7_JDpS8_EE4typeEOS7_SA_","C":"std::__invoke_result<mongo::stdx::thread:
ZNSt6thread8_InvokerISt5tupleIJZN5mongo4stdx6threadC1IZNS2_10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JELi0EEET_DpOT0_EUlvE_EEE9_M_invokeIJLm0EEEEvSt12_Index_tupleIJXspT_EEE","C":"void std::thread::_Inv
ZNSt6thread8_InvokerISt5tupleIJZN5mongo4stdx6threadC1IZNS2_10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JELi0EEET_DpOT0_EUlvE_EEEclEv","C":"std::thread::_Invoker<std::tuple<mongo::stdx::thread::thread<mon
ZNSt6thread11_State_implINS_8_InvokerISt5tupleIJZN5mongo4stdx6threadC1IZNS3_10ThreadPool4Impl25_startWorkerThread_inlockEvE4$_21JELi0EEET_DpOT0_EUlvE_EEEEE6_M_runEv","C":"std::thread::_State_impl<std::thread::_I 

Generated at Thu Feb 08 06:06:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.