[SERVER-36779] Election handoff schedules a network request under ReplCoord mutex Created: 21/Aug/18  Updated: 27/Oct/23  Resolved: 21/Aug/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Siyuan Zhou
Resolution: Works as Designed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-35623 Send a replSetStepUp command to an el... Closed
Operating System: ALL
Participants:

 Description   

It looks like ReplicationCoordinatorImpl::_performElectionHandoff() is waiting for a remote command while holding the repl coord mutex. Or have I missed something?

In addition, _performElectionHandoff() is only called from inside ReplicationCoordinatorImpl::stepDown(), and my reading of that function is that the global lock will also be held while _performElectionHandoff is called (although earlier it gets released, it then gets reacquired).

This code was added in SERVER-35623.



 Comments   
Comment by Kevin Pulo [ 22/Aug/18 ]

Yep, sorry about that — somehow I misread the getStatus() as "wait and return the remote's response". There was no direct failure observed around this.

Comment by Siyuan Zhou [ 21/Aug/18 ]

kevin.pulo, did you observe any failure around this code? _performElectionHandoff() schedules the command to be sent in a fire-and-forget manner and doesn't wait for it to finish, so it's fast and safe to run while holding the mutex or global lock. I'm changing the title and closing this ticket as "Works as designed".

Generated at Thu Feb 08 04:44:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.