[SERVER-21868] Shutdown may not be handled correctly on secondary nodes Created: 11/Dec/15 Updated: 25/Jan/17 Resolved: 17/Dec/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.2.0 |
| Fix Version/s: | 3.2.1, 3.3.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | code-only | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Completed: | |||||||||
| Sprint: | Repl E (01/08/16) | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 0 | ||||||||
| Description |
|
Issue Status as of Dec 16, 2015 ISSUE SUMMARY This problem only applies to a “clean shutdown”, which occurs when the node is shut down via one of the following means:
Notably, this error does not apply to nodes that shut down abnormally. If a mongod process is ended due to a hard termination, such as via a KILL signal, it will not be subject to this bug. USER IMPACT WORKAROUNDS Use a non-clean shutdown methodBy inducing a non-clean shutdown, the bug can be avoided. This approach is safe on all deployments using WiredTiger, and all MMAP deployments with journaling enabled (the default). On a system that supports posix signals, send a KILL (9) or QUIT (3) signal to the mongod process to shut it down. On Windows, use “tskill”. The storage engine and replication recovery code will bring the node back into a consistent state upon server restart. This is a temporary workaround for 3.2.0 users. Do not use after upgrading to 3.2.1 or newer. Remove the node from the replica setRemoving the node from its replica set configuration before shutting it down ensures that the node is not processing replicated writes at shutdown time. Remove the node from the replica set configuration via the replSetReconfig command or rs.reconfig shell helper. Then, wait for the node to enter the REMOVED state before shutting it down. AFFECTED VERSIONS FIX VERSION Original descriptionIn sync_tail.cc, multiApply() assumes the application always succeeds, then sets minValid to acknowledge that.
multiApply() delegates the work to applyOps(), which simply schedules the work to worker threads:
However schedule() may return an error to indicate shutdown is already in progress. sync_tail.cpp ignores the error and continues to mark that operation finished. If the shutdown happens after the schedule of operations, the secondary will run into another fassert, which is also unexpected. Restart cannot fix the inconsistent state either. This has also been observed in repeated runs of backup_restore.js As a result, any kind of operations may be marked executed by mistake when shutting down the secondary, including commands and database operations, leading to an inconsistent state with the primary and potential missing/stale documents on secondaries. To fix this issue, after the on_block_exit of the join call we need to check if shutdown is happened and return the empty optime to indicate the batch is not complete. |
| Comments |
| Comment by Githook User [ 17/Dec/15 ] |
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}Message: (cherry picked from commit ac70c5eb4d987702535ad6c00ab980de5873cdf4) |
| Comment by Githook User [ 17/Dec/15 ] |
|
Author: {u'username': u'visualzhou', u'name': u'Siyuan Zhou', u'email': u'siyuan.zhou@mongodb.com'}Message: |