[SERVER-42494] Deadlock between aggregation pipeline and IndexBuildsCoordinator in storage engines that do not support document level locking Created: 30/Jul/19  Updated: 06/Dec/22  Resolved: 05/Aug/19

Status: Closed
Project: Core Server
Component/s: Concurrency, Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Gregory Wlodarek Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-37720 Allow active index builds during rena... Blocked
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:
Linked BF Score: 19

 Description   

The IndexBuildsCoordinator can get into a deadlock scenario the following way:
Thread 1:

  • Begin an index build on test.one.
  • Grab an intent lock on test and an exclusive lock on test.one.
  • Grab the IndexBuildsCoordinator mutex.
  • Initialize the MultiIndexBlock calling init().
  • Log the operation to the oplog (Grabs intent lock on local and wait to grab intent lock on local.oplog.rs. See logic in logOp() below.

if (!opCtx->getServiceContext()->getStorageEngine()->supportsDocLocking()) {
    dbWriteLock.emplace(opCtx, NamespaceString::kLocalDb, MODE_IX);
    collWriteLock.emplace(opCtx, oplogInfo->getOplogCollectionName(), MODE_IX);
}

Thread 2:

  • During the aggregation pipeline, run a rename command to rename a temporary collection.
  • Rename with source test.tmp.two and target test.two.
  • Grab the intent lock on test and exclusive locks on both tmp.two and two.
  • Check if there are any ongoing index builds by calling IndexBuildsCoordinator:: assertNoIndexBuildInProgForCollection().
  • Waits to grab the IndexBuildsCoordinator mutex.

Since thread 1 is stuck waiting for an intent lock on local.oplog.rs it must mean that thread 2 is holding an exclusive lock on local.oplog.rs before running the rename command as I could not find any lock acquisitions for it during the rename call.



 Comments   
Comment by Gregory Wlodarek [ 05/Aug/19 ]

dan.solnik's work in SERVER-37720 will fix this deadlock as that code path that asserts no index builds are in progress will go away with his work.

Generated at Thu Feb 08 05:00:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.