Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65418

Release all resources before sleep in WriteConflictException::logAndBackoff

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.0.23, 5.0.7
    • Component/s: None
    • Storage Execution
    • ALL

      When modify one single document concurrently, only on request can commit successfully, and other requests get WriteConflictExceptions and retry internally(https://github.com/mongodb/mongo/blob/v5.0/src/mongo/db/concurrency/write_conflict_exception.h#L70-L106).

       

       
      logAndBackofff(https://github.com/mongodb/mongo/blob/v5.0/src/mongo/db/concurrency/write_conflict_exception.cpp#L51-L59) is called before each retry attempt, and sleep is called when numAttempts is greater than 3(https://github.com/mongodb/mongo/blob/v5.0/src/mongo/util/log_and_backoff.cpp#L39-L51)

       
      But the resources of these requests are not released while sleeping, so here is the problem : newly incoming requests are stucked, because the global tickets and global/database/collection locks are held by a lot of "sleeping retry requests".

      For example, if 128 write requests are sleeping for 10ms (and waiting to retry again) in the same time, then there are no available global write ticket during this 10ms. All newly incoming write requests are stucked, but MongoDB has nothing to do (and just sleep).

      I think it is better to release all resources before sleep and get back resources after sleep if the retry-function is not in a WUOW. In this way, MongoDB can handle more requests in this period. My basic idea is this:

      // release all resources
      Locker::LockSnapshot ls;
      invariant(opCtx->lockState()->saveLockStateAndUnlock(&ls));
      
      // logAndBackoff sleep
      
      // get all resources back and retry
      opCtx->lockState()->restoreLockState(ls); 

       

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            pengzhenyi peng zhenyi
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated: