[SERVER-21228] Do not hold the replication executor mutex/lock while running isself Created: 30/Oct/15  Updated: 06/Dec/22  Resolved: 20/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.0
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Scott Hernandez (Inactive) Assignee: Backlog - Replication Team
Resolution: Done Votes: 0
Labels: replexecutor
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

Currently when loading or validating a new config the isself check is run when holding the mutex/lock and it can lead to a dead lock or client/operation delays.

We should instead schedule any long running IO, like the network request isself might make, of any lock.

We may also want to audit the code to make sure no instance of this pattern exist, where we hold the mutex and issue network/disk work.



 Comments   
Comment by Benety Goh [ 02/Nov/15 ]

acm, would it make to extend the network interface API to provide an asynchronous isself function?

The issue here is that we are calling the potentially long running isself function in a ReplicationExecutor callback which could block other tasks scheduled by the ReplicationCoordinator from progressing.

We would need to reorder some of the callbacks for the replication configuration checks but that should not be a big deal if we can process the result of the isself check in a ReplicationCoordinatorImpl::callback.

Comment by Eric Milkie [ 02/Nov/15 ]

I'd like to at least research the details of this before 3.2 release.

Generated at Thu Feb 08 03:56:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.