[SERVER-36779] Election handoff schedules a network request under ReplCoord mutex Created: 21/Aug/18 Updated: 27/Oct/23 Resolved: 21/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kevin Pulo | Assignee: | Siyuan Zhou |
| Resolution: | Works as Designed | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
It looks like ReplicationCoordinatorImpl::_performElectionHandoff() is waiting for a remote command while holding the repl coord mutex. Or have I missed something? In addition, _performElectionHandoff() is only called from inside ReplicationCoordinatorImpl::stepDown(), and my reading of that function is that the global lock will also be held while _performElectionHandoff is called (although earlier it gets released, it then gets reacquired). This code was added in |
| Comments |
| Comment by Kevin Pulo [ 22/Aug/18 ] |
|
Yep, sorry about that — somehow I misread the getStatus() as "wait and return the remote's response". There was no direct failure observed around this. |
| Comment by Siyuan Zhou [ 21/Aug/18 ] |
|
kevin.pulo, did you observe any failure around this code? _performElectionHandoff() schedules the command to be sent in a fire-and-forget manner and doesn't wait for it to finish, so it's fast and safe to run while holding the mutex or global lock. I'm changing the title and closing this ticket as "Works as designed". |