[SERVER-76777] Deadlock between index build external abort and self abort Created: 03/May/23  Updated: 29/Oct/23  Resolved: 12/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.0-rc2

Type: Bug Priority: Major - P3
Reporter: Yujin Kang Park Assignee: Yujin Kang Park
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Gantt Dependency
has to be done before SERVER-76935 Disallow index build external abort w... Closed
Related
is related to SERVER-75308 Invariant failure involving a collect... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0
Sprint: Execution Team 2023-05-15
Participants:
Linked BF Score: 132

 Description   

On detecting an indexing error, the build aborts immediately and proceeds to cleanup. During cleanup, there's a check to see if the index build is already in kAborted state. If an external aborter concurrently set the state to kAborted, before the builder performs the check, then the builder will exit early. On the other hand, it can be the case that the external abort happens after the builder checks, in which case we can hit a deadlock between the external and internal abort.

External abort happening before the check:

Collection drop (external abort) Index Builder (self abort)
  •  
Indexing error (still kInProgress)
  •  
Transitioning to cleanup code
abortIndexBuildByBuildUUID  
Coll Lock MODE_X  
setState(kAborted)  
killOp  
_completeAbort  
wait for future  
  •  
replState->isAborted() (true)
  •  
signal promise
future wait ends  

External abort after the check (deadlock):

Collection drop (external abort) Index Builder (self abort)
  •  
Indexing error (still kInProgress)
  •  
replState->isAborted() (false)
abortIndexBuildByBuildUUID  
Coll Lock MODE_X  
setState(kAborted)  
killOp  
_completeAbort  
wait for future (stuck here)  
  •  
proceed with self abort
  •  
Coll Lock MODE_X (waiting for lock)
  •  
...
  •  
signal promise (never gets here)


 Comments   
Comment by Githook User [ 17/May/23 ]

Author:

{'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}

Message: SERVER-76777 Retry coll lock acquisition on self abort
Branch: v7.0
https://github.com/mongodb/mongo/commit/ebddcadadbcd8e5a1f867772963475d48766dfeb

Comment by Githook User [ 11/May/23 ]

Author:

{'name': 'Yu Jin Kang Park', 'email': 'yujin.kang@mongodb.com', 'username': 'ykangpark'}

Message: SERVER-76777 Retry lock coll lock acquisition on self abort
Branch: master
https://github.com/mongodb/mongo/commit/837ff506b122b33005d9fe451a0918b497cefeb1

Generated at Thu Feb 08 06:33:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.