[SERVER-22977] confusing error message when trying to step down primary Created: 06/Mar/16  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Adam Schwartz Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Replication
Participants:

 Description   

The only secondary in a three member PSA replica set mistakenly has a priority of zero. The secondary is not lagging and is completely caught up.

When trying to step down the Primary, the command fails as follows:

In 3.0.9

test> rs.stepDown()
{
	"ok" : 0,
	"errmsg" : "No electable secondaries caught up as of 2016-03-06T07:32:04.336+0200",
	"code" : 50
}

In 3.2.3

test> rs.stepDown()
{
	"ok" : 0,
	"errmsg" : "No electable secondaries caught up as of 2016-03-06T09:24:08.753+0200. Please use {force: true} to force node to step down.",
	"code" : 50
}

The "caught up as of" clause in the errmsg is confusing as the user is led to believe the reason the primary cannot step down is due to a lagging secondary. The causes the user to pointlessly check and recheck rs.printreplicationinfo() and rs.printSlaveReplicationInfo() to verify there is not any lag.

In this case where there are no secondaries that are electable, the errmsg would be more helpful if it just said "No electable secondaries". In this case, the clause "caught up as of" could be skipped.

The streamlined error message would correctly prod the user to check the rs.conf(), looking for why the secondary is unelectable, instead of pointlessly checking lag times.

Thanks



 Comments   
Comment by German Gutierrez [ 16/Feb/18 ]

Somebody have this same error

Generated at Thu Feb 08 04:01:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.