[SERVER-21228] Do not hold the replication executor mutex/lock while running isself Created: 30/Oct/15 Updated: 06/Dec/22 Resolved: 20/Mar/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Scott Hernandez (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | replexecutor | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Replication
|
| Operating System: | ALL |
| Participants: |
| Description |
|
Currently when loading or validating a new config the isself check is run when holding the mutex/lock and it can lead to a dead lock or client/operation delays. We should instead schedule any long running IO, like the network request isself might make, of any lock. We may also want to audit the code to make sure no instance of this pattern exist, where we hold the mutex and issue network/disk work. |
| Comments |
| Comment by Benety Goh [ 02/Nov/15 ] |
|
acm, would it make to extend the network interface API to provide an asynchronous isself function? The issue here is that we are calling the potentially long running isself function in a ReplicationExecutor callback which could block other tasks scheduled by the ReplicationCoordinator from progressing. We would need to reorder some of the callbacks for the replication configuration checks but that should not be a big deal if we can process the result of the isself check in a ReplicationCoordinatorImpl::callback. |
| Comment by Eric Milkie [ 02/Nov/15 ] |
|
I'd like to at least research the details of this before 3.2 release. |