[SERVER-25071] Ensure replication batch finishes before shutdown Created: 14/Jul/16 Updated: 26/Jan/18 Resolved: 26/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.3.0 |
| Fix Version/s: | 3.3.12 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Scott Hernandez (Inactive) | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2016-08-29 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
Much like There are a few ways to get there:
In addition we should move setting the minvalid boundaries under the PBWM lock so that minvalid reflects the completed apply batch. |
| Comments |
| Comment by Githook User [ 08/Sep/16 ] |
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: |
| Comment by Mathias Stearn [ 26/Aug/16 ] |
|
It may be worth mentioning that this may cause clean shutdowns to take longer as they will no longer interrupt operations on secondaries. This is mostly an issue with index builds. |
| Comment by Githook User [ 26/Aug/16 ] |
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'mathias@10gen.com'}Message: |
| Comment by Kevin Pulo [ 25/Jul/16 ] |
|
scotthernandez, although it's related, I think the thing with setting minvalid while appropriate locks are held is more of a separate issue. I've filed
Together, However, I think it might be possible to shutdown during the "way in" race on |
| Comment by Kevin Pulo [ 25/Jul/16 ] |
More importantly (IMHO), it's not a useful backup for DR purposes. This is a problem because a shutdown mongod is considered gold-standard in terms of being quiesced for the purposes of taking a backup. Users rightly expect that a mongod which has been cleanly shutdown has data which is "good" in every way imaginable/possible. For a replset, this includes consistency with the primary. It will be quite shocking for the user who later discovers that the data they saved — from a shutdown mongod — is only useful in the presence of other data which is stored on other hosts (or rather, was stored on other hosts...). The server has worked this way in all previous versions (including 3.2, once 3.2.9 has been released), so this feels more like a regression to me. |
| Comment by Scott Hernandez (Inactive) [ 15/Jul/16 ] |
|
When the member is restarted it cannot get to a consistent data set unless it can contact another consistent member so it can finish applying. In essence, while it is in this state, the node is invalid and has no (usable) data, until it can finish the batch. It cannot for example transition to secondary, nor become primary. The 3-node set, with a single arbiter, is a good example of a failure where losing the primary, and the secondary being in the middle of a batch during shutdown means you lose the ability to recover when the secondary and arbiter restart. |
| Comment by Andy Schwerin [ 15/Jul/16 ] |
|
What is the harm of shutting down mid-batch? |