[SERVER-63169] Fix deadlock in balancer_defragmentation_merge_chunks Created: 01/Feb/22  Updated: 29/Oct/23  Resolved: 02/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0

Type: Bug Priority: Major - P3
Reporter: Allison Easton Assignee: Allison Easton
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Linked BF Score: 42

 Description   

In order to prevent problems with step down during defragmentation, user cancellation of defragmentation no longer removes the defragmentCollection flag on the collection, but checks whether there is an existing defragmentation state for this collection and then changes the phase to kSplitChunks. This check requires a lock acquisition. If the test has completed phase 1 when defragmentation is cancelled, the balancer thread will be waiting at the failpoint while holding the mutex, causing a deadlock with the cancellation.

This can be solved by causing the failpoint to not pause, but make transition phases a no-op. Then the deadlock will be solved, and each balancer round will try to transition phases again until the failpoint is cleared.



 Comments   
Comment by Githook User [ 02/Feb/22 ]

Author:

{'name': 'Allison Easton', 'email': 'allison.easton@mongodb.com', 'username': 'allisoneaston'}

Message: SERVER-63169 Fix deadlock in balancer_defragmentation_merge_chunks
Branch: master
https://github.com/mongodb/mongo/commit/b2ec1f2a8da36081e7792a4299a894444d740f17

Generated at Thu Feb 08 05:57:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.