[SERVER-25497] Fix sharded query path to handle shutdown of the mongos process Created: 08/Aug/16 Updated: 16/Oct/20 Resolved: 15/Oct/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.0-rc0 |
| Fix Version/s: | 4.9.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Davenson Lombard | Assignee: | Ruoxin Xu |
| Resolution: | Done | Votes: | 0 |
| Labels: | mongos-drain-mode-fallout, qexec-team, query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Sharding 2016-10-10, Query 2018-02-26, Query 2020-05-04, Query 2020-10-05, Query 2020-10-19 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||
| Linked BF Score: | 15 | ||||||||||||||||||||||||
| Description |
| Comments |
| Comment by Githook User [ 15/Oct/20 ] | ||||||||||||||||||||||||||||||
|
Author: {'name': 'Ruoxin Xu', 'email': 'ruoxin.xu@mongodb.com', 'username': 'RuoxinXu'}Message: | ||||||||||||||||||||||||||||||
| Comment by Ian Boros [ 27/Feb/18 ] | ||||||||||||||||||||||||||||||
|
Also! Note that there's another subtle bug with how the ARM deals with shutdown: When it checks for shutdown and returns an invalid event, the caller will likely free the ARM immediately. This is not safe because there may be outstanding callbacks, which, when run will try to access the ARM. Both of the solutions we came up with above (changing the shutdown order and using futures) should solve this problem as well. | ||||||||||||||||||||||||||||||
| Comment by Ian Boros [ 27/Feb/18 ] | ||||||||||||||||||||||||||||||
|
We looked into fixing this as part of the sharded kill project. The fact that an event is double-signaled is a larger problem that's outside the scope of this ticket. I've filed a ticket about this particular problem here: As far as fixing the problem with the ARM: After more discussion with david.storch, we've agreed that doing what he suggests in the comment above could be too costly to performance on point-finds. We could add partition-locking to the ClusterCursorManager to mitigate this, but it could still affect performance. Another way to fix this problem would be to change the order of shutdown so that the TaskExecutor shuts down after all client threads have terminated. However, we only join on client threads when running under ASAN, and even then, we do so with a timeout. Changing the mongos to join with all client threads would probably be a significant amount of work. Yet another alternative would be to use "Events" which are separate from the TaskExecutor, and still work even when shutdown is happening. After talking with redbeard0531 in-person, it seems like "Futures" will provide exactly this. Therefore I think that modifying the ARM to use Futures rather than TaskExecutor Events should fix the problems around the ARM and shutdown. Since this work of "futurizing" the ARM is already planned, I think we should just wait until we have futures to work on this. | ||||||||||||||||||||||||||||||
| Comment by David Storch [ 19/Dec/16 ] | ||||||||||||||||||||||||||||||
|
After further investigation by the MongoDB Query Team, we have decided to mitigate the problem by fixing the initially reported crash under related ticket In order to fix the problem in a complete way, we would have to make the following changes:
We already shut down the mongos cursor manager before shutting down the TaskExecutor and we already prevent registration of new cursors once the ClusterCursorManager begins to shut down. Therefore, in order to acheive #1, all we need to do is ensure that mongos cursors come into existence registered. This would prevent the creation of any new cursors once ClusterCursorManager::shutdown() begins. It is an easy code change to make, but could come with a performance cost. In fact, the implementation used to work this way, but was changed in this commit in order to improve performance. One way to achieve #2 would be to mark all outstanding cursors as kill pending and wait for them to die before proceeding with shutdown. This is an unacceptable approach because it could delay shutdown for large amounts of time while the shards complete query execution work. Therefore, we need a way to preemptively kill an active mongos cursor by terminating any cursors which are in use on the shards. Allowing pinned cursors to be killed, The complexity of these changes makes it unlikely that we will deliver the complete fix in 3.2 or 3.4, which is why we have chosen to work on the smaller, mitigating fix in CC schwerin. | ||||||||||||||||||||||||||||||
| Comment by David Storch [ 28/Oct/16 ] | ||||||||||||||||||||||||||||||
|
Our testing has revealed a similar race which can cause mongos to crash during shutdown with the following invariant failure:
This crash occurs due to the following sequence of events:
| ||||||||||||||||||||||||||||||
| Comment by David Storch [ 07/Oct/16 ] | ||||||||||||||||||||||||||||||
|
The cause behind the invariant failure described by schwerin is not yet clear to me, though I can see from code inspection how the originally reported invariant failure can occur. The original invariant failure happens in RouterStageMerge::kill(), which asks the AsyncResultsMerger for an event and then waits on it: https://github.com/mongodb/mongo/blob/r3.4.0-rc0/src/mongo/s/query/router_stage_merge.cpp#L59-L60 If mongos shutdown is in progress, however, we will have returned an invalid Event: https://github.com/mongodb/mongo/blob/r3.4.0-rc0/src/mongo/s/query/async_results_merger.cpp#L635 We trip the invariant when attempting to wait on this invalid Event. The bug should only manifest when a mongos cursor gets killed (or kills itself) concurrently with mongos shutdown. The stack trace provided by Andy seems to be a separate bug, probably with a very similar root cause. | ||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 07/Oct/16 ] | ||||||||||||||||||||||||||||||
|
The stack trace from the repro instructions is different from the one in the original bug report. In the repro, the invalid event is returned from AsyncResultsMerger::nextEvent, and in the original report it comes from AsyncResultsMerger::kill. Maybe two similar bugs, maybe one root cause. Here's the stack trace from the repro, as of 3.4.0-rc0:
| ||||||||||||||||||||||||||||||
| Comment by Andy Schwerin [ 07/Oct/16 ] | ||||||||||||||||||||||||||||||
|
This appears to be a race between the AsyncResultsMerger's execution of nextEvent() and shutting down the task executor, causing nextEvent() to return Status::OK() with an invalid event. This causes the RouterStageMerge processor to call wait on an invalid event, violating the invariant. Assigning to query team for analysis and triage. I'm pretty sure this only happens during mongos shutdown. |