[SERVER-62458] Fix deadlock in FCBIS _cloneFiles Created: 10/Jan/22  Updated: 29/Oct/23  Resolved: 11/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0, 5.2.0-rc6

Type: Bug Priority: Critical - P2
Reporter: Moustafa Maher Assignee: Moustafa Maher
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2
Sprint: Replication 2022-01-24
Participants:
Linked BF Score: 46

 Description   

Issue:
In FileCopyBasedInitialSyncer::_cloneFiles, we can have a deadlock while acquiring FileCopyBasedInitialSyncer::_mutex between the main asyncTry and the return future.

I assume that can happen if the startClonerFuture is ready before we hit the return statement.

Proposed fix:

The lock in the main asyncTry needs to be scoped to include only state's changing lines and not to include the return statement.

 

Needed for better debugging:

Can we add also a log line in the begging of FileCopyBasedInitialSyncer::_cloneFiles to output the number of files in filesToClone



 Comments   
Comment by Githook User [ 11/Jan/22 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-62458 Fix deadlock in FCBIS _cloneFiles
Branch: v5.2
https://github.com/10gen/mongo-enterprise-modules/commit/0b9a52927cd90d589e76b008aa2c6b9a2df080c0

Comment by Githook User [ 11/Jan/22 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-62458 Fix deadlock in FCBIS _cloneFiles
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/c0512f09e311e3657e3e2d5554e60dc1b454b6e6

Generated at Thu Feb 08 05:55:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.