[SERVER-66751] Determine if lock acquisition can happen inside of the writeConflictRetry in upsert_stage.cpp Created: 25/May/22  Updated: 04/Aug/23

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-78655 Integrate TransactionResources with g... Closed
Related
related to SERVER-65418 Release all resources before sleep in... Open
Assigned Teams:
Query Execution
Sprint: Execution Team 2023-02-20, Execution Team 2023-02-06, Execution EMEA Team 2023-06-26, Execution EMEA Team 2023-07-10, Execution EMEA Team 2023-07-24, Execution EMEA Team 2023-08-07
Participants:

 Description   

Running

resmoke --suites=concurrency jstests/concurrency/fsm_workloads/CRUD_and_commands_with_createindexes.js

with this diff

diff --git a/src/mongo/db/concurrency/exception_util.h b/src/mongo/db/concurrency/exception_util.h
index 6cfe238ca7d..63243009bac 100644
--- a/src/mongo/db/concurrency/exception_util.h
+++ b/src/mongo/db/concurrency/exception_util.h
@@ -34,6 +34,7 @@
 #include "mongo/db/operation_context.h"
 #include "mongo/util/assert_util.h"
 #include "mongo/util/fail_point.h"
+#include "mongo/util/stacktrace.h"
 
 namespace mongo {
 
@@ -124,6 +125,10 @@ auto writeConflictRetry(OperationContext* opCtx, StringData opStr, StringData ns
     int attemptsTempUnavailable = 0;
     while (true) {
         try {
+            if (opCtx->lockState()->isLocked()) {
+                printStackTrace();
+            }
+
             return f();
         } catch (WriteConflictException const&) {
             CurOp::get(opCtx)->debug().additiveMetrics.incrementWriteConflicts(1);

Shows that this writeConflictRetry is run with locks acquired at some earlier point. This is problematic because if a WriteConflictException is thrown, the locks and ticket will be held in logWriteConflictAndBackoff, which sleeps.

 



 Comments   
Comment by Josef Ahmad [ 04/Aug/23 ]

I'm assigning this ticket to the query backlog for consideration. During code review, we agreed that the upsert stages should ideally use the generic query-yielding infrastructure.

Comment by Josef Ahmad [ 24/Jul/23 ]

Unfortunately, moving the lock acquisition inside upsert's writeConflictRetry block is not trivial because query execution acquires the relevant locks beforehand. Note that this upsert's writeConflictRetry block is only used for the insert case (i.e., no existing document to update), and overall the block seems a bit of an oddity compared to other query execution stages which handle WCE yielding using logic specific to query exec (example: handlePlanStageYield in the update stage).

I've considered two options to release resources during WCE backoff: adding query execution yielding support to the upsert stage instead of the writeConflictRetry block, or adding yield/restore support to the block. I'm exploring the latter as it's likely less involved.

One behavioural change is that upsert and findAndModify will fail if the collection is dropped or renamed during the write conflict backoff.

Generated at Thu Feb 08 06:06:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.