[SERVER-23944] Failure to commit chunk migration due to shutdown should not fassert Created: 27/Apr/16 Updated: 29/Aug/18 Resolved: 23/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.5, 3.3.5 |
| Fix Version/s: | 3.3.14 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Done | Votes: | 0 |
| Labels: | bkp, neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Sprint: | Sharding 16 (06/24/16), Sharding 2016-10-10 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 18 | ||||||||||||||||||||||||||||||||||||
| Description |
|
If the commit chunk migration code fails to apply the metadata change transaction to the config server, it will do a best-effort attempt to figure out whether the operation was actually applied or not. If this check fails for any reason, we currently terminate the server in order to avoid data corruption or loss. Before terminating the server, we should check whether it is being shutdown and if so, we can avoid introducing a fatal assertion.
|
| Comments |
| Comment by Githook User [ 23/Sep/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Kaloian Manassiev [ 19/Sep/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 19/Sep/16 ] |
|
Can we backport this too? It's a v3.2 problem, too. https://jira.mongodb.org/browse/BF-1936 |
| Comment by Kaloian Manassiev [ 19/Sep/16 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Dianna Hohensee (Inactive) [ 14/Sep/16 ] |
|
After discussion, leaning toward staying in the critical section if the log write gets a shutdown error, either by infinite loop with sleep retrying the log write, or by keeping the critical flag somehow and then returning. Shutdown cannot be halted once commenced. If Shutdown error is received on the refresh command, we can just clear the metadata because the shard will have acquired the latest optime from the remote log command and any other process that needs the metadata will correctly reload the metadata. If the optime were stale, the reload is potentially stale, so we can't let anything happen with a stale optime. |
| Comment by Dianna Hohensee (Inactive) [ 01/Jul/16 ] |
|
kaloian.manassiev What do we want the shard to do if it gets interrupted by a shutdown in the refresh logic? |
| Comment by Dianna Hohensee (Inactive) [ 15/Jun/16 ] |
|
The patch for this fix should also include a JS test to make sure the fix works. Set up a moveChunk, set some failpoints, shutdown servers without catching a fassert. |