[SERVER-27015] db.shutdownServer doesn't find electable secondaries Created: 14/Nov/16  Updated: 21/Nov/16  Resolved: 21/Nov/16

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Yoni Douek Assignee: Spencer Brody (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-27118 Default shutdown command's 'timeoutSe... Closed
Operating System: ALL
Participants:

 Description   

Ever since we've upgraded to 3.2.10 - when running db.shutdownServer() - we constantly get:

shutdownServer failed: {
	"ok" : 0,
	"errmsg" : "No electable secondaries caught up as of 2016-11-14T06:51:05.073+0000. Please use {force: true} to force node to step down.",
	"code" : 50
}

  • There are 2 very "tight" secondaries with 0-2 replication lag
  • rs.stepDown() works, so we have to stepDown and only then shutdown the server


 Comments   
Comment by Yoni Douek [ 19/Nov/16 ]

Sounds great. Thanks!

Comment by Spencer Brody (Inactive) [ 18/Nov/16 ]

Hi Yoni,
This is most likely due to the fact that by default the shutdown command will only succeed on a primary if the secondaries are fully caught up at the exact moment that the shutdown command is executed. There is a 'timeoutSecs' argument that can be provided to the shutdown command to give it more time for the secondaries to catch up before it fails. I filed SERVER-27118 to change the default value of the 'timeoutSecs' argument to the shutdown command from 0 to 10, to match the behavior of the replSetStepDown command. In the meantime you can provide that argument explicitly as a workaround.

-Spencer

Comment by Yoni Douek [ 14/Nov/16 ]

As mentioned - they are syncd:

source: in.db3m2.xx.com :27017
	syncedTo: Mon Nov 14 2016 13:09:03 GMT+0000 (UTC)
	1 secs (0 hrs) behind the primary 

logs on either primary or secondary don't mention anything related to this. This reproduces every time, on all of our shard replicas.

Comment by Ramon Fernandez Marina [ 14/Nov/16 ]

yonido, when this happens, can you please provide the output of rs.printSlaveReplicationInfo()? Can you also upload the logs for this node at the time you run db.shutdownServer()? In the mean time I'll try to reproduce on our end.

Thanks,
Ramón.

Generated at Thu Feb 08 04:13:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.