[SERVER-3695] Running shutdown command on Primary while all secondaries are fsync locked and not caught up says that no secondaries were within 10 seconds of primaries optime, even if they were Created: 26/Aug/11 Updated: 16/Jan/20 Resolved: 02/Dec/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Usability |
| Affects Version/s: | 2.0.0-rc0 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Minor - P4 |
| Reporter: | Spencer Brody (Inactive) | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | sync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
If you try to run the shutdownServer command on a primary, it won't step down unless there is a secondary totally caught up. The error message it reports, however, is "shutdownServer failed: no secondaries within 10 seconds of my optime". Either the error message should be updated or the behavior should be changed to match the message. |
| Comments |
| Comment by Spencer Brody (Inactive) [ 02/Dec/14 ] |
|
It has been determined that this is the desired behavior - any stepdown/shutdown that leaves your system without a usable PRIMARY should require 'force' to run. |
| Comment by Spencer Brody (Inactive) [ 01/Dec/14 ] |
|
New message from 2.8-rc2-pre with 3 node set with 2 secondaries, both fsync-locked and with a pending write waiting for replication: "shutdownServer failed: No electable secondaries caught up as of 2014-12-01T17:57:51.915-0500" With a single node set however this seems to be a problem again. Digging in now. |
| Comment by Eric Milkie [ 02/Sep/14 ] |
|
Parking with Spencer to look at reproducing after the 2.7 refactoring is complete and we've switched off the Legacy coordinator. |
| Comment by Spencer Brody (Inactive) [ 26/Aug/11 ] |
|
I think this actually only happens if the secondaries are fsync locked. If they're just behind, then it will work if they're within 10 seconds of the primary. If the secondaries are all locked, however, then even if they're only a fraction of a second behind, the shutdown will immediately fail. Probably the error message should just be updated for this edge case. |