[SERVER-24933] Clean shutdown of secondaries should occur in between oplog batches, not during Created: 06/Jul/16 Updated: 20/Nov/16 Resolved: 14/Jul/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.5 |
| Fix Version/s: | 3.2.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andy Schwerin | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-and-test | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Repl 17 (07/15/16) | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Starting in MongoDB version 3.2, replica set secondaries write entries to the oplog concurrently with applying the changes described by those oplog entries to the data files. Because of this change, starting in MongoDB 3.2, the singleton document in local.replset.minvalid contains a "begin" field during periods of oplog application, indicating the newest oplog entry not in the current batch of oplog entries being applied. In order to properly downgrade from 3.2 to 3.0, which ignores the "begin" field, 3.2 secondary nodes should only shut down cleanly at oplog batch boundaries, not in the middle of applying a batch. |
| Comments |
| Comment by Githook User [ 19/Aug/16 ] | ||
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}Message: This avoids a spurious failure in the rare case where the top of the oplog is | ||
| Comment by Githook User [ 14/Jul/16 ] | ||
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}Message: | ||
| Comment by Githook User [ 14/Jul/16 ] | ||
|
Author: {u'username': u'RedBeard0531', u'name': u'Mathias Stearn', u'email': u'redbeard0531@gmail.com'}Message: | ||
| Comment by Andy Schwerin [ 07/Jul/16 ] | ||
|
This bug may cause the v3.2 nodes downgraded to v3.0 node to fail to apply some oplog entries on restart. If upgrade to 3.2 happens soon thereafter, without an intervening primary election, then the v3.2 node will reapply the oplog correctly and recover from the data loss, but if it does not, the data loss may persist. | ||
| Comment by Scott Hernandez (Inactive) [ 07/Jul/16 ] | ||
|
As a workaround you can ensure that the "begin" field is removed before upgrading (again):
| ||
| Comment by Andy Schwerin [ 06/Jul/16 ] | ||
|
redbeard0531 suggests that this could be fixed by having clean shutdown acquire the PBWM lock before setting the _inShutdown flag, to ensure that shutdown does not occur mid-batch. This does not need to be fixed on master, because a downgrade from master/v3.4 directly to v3.0 is prohibited, and 3.2 is aware of the "begin" field in the minvalid document. |