[SERVER-29098] ShardingTaskExecutor::scheduleRemoteCommand should not run passed callback before processing metadata Created: 05/May/17 Updated: 30/Oct/23 Resolved: 24/May/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.8 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Sharding
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
Also add comment in task_executor.h that implementations for scheduleWork/scheduleWorkAt/scheduleRemoteCommand should ensure that the passed callback is the last thing that gets called (no post processing directly related to the callback request).
|
| Comments |
| Comment by Githook User [ 24/May/17 ] |
|
Author: {u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: |
| Comment by Esha Maharishi (Inactive) [ 15/May/17 ] |
|
If we don't fix this at the ShardingTaskExecutor layer by moving the passed-in callback to after metadata updates, we'll have to make sure every use of ShardingTaskExecutor::scheduleRemoteCommand waits for the "wrapper" callback to run. I think this is more difficult to do and maintain, since it's different than the way we use the TaskExecutor interface, but it's often not immediately obvious at a call site whether TaskExecutor or ShardingTaskExecutor is being used, even in sharding code. |
| Comment by Esha Maharishi (Inactive) [ 15/May/17 ] |
|
Hm, another way of looking at this: ShardingTaskExecutor's scheduleRemoteCommand method is a thin wrapper around TaskExecutor::scheduleRemoteCommand. The wrapper's job is to perform metadata updates related to the operation (e.g., update clusterTime, update getLastError stats). However, the wrapper calls the passed-in callback before doing these metadata updates. Most callers assume that the callback they pass in to scheduleRemoteCommand is the last thing that the TaskExecutor will run. Thus, the caller may return to the client before the metadata updates, which will be still executing in parallel inside the TaskExecutor, have run! This is problematic, because the metadata updates rely on there being an OperationContext, but if the response has been returned to the client, the associated OperationContext is gone or invalid! So, I think a good fix would be for ShardingTaskExecutor::scheduleRemoteCommand to call the passed-in callback after doing the metadata updates. This ensures that metadata updates will not "lose" the OperationContext, and we still guarantee to callers that once their callback has completed, the TaskExecutor is not executing any more work, including metadata updates, for their operation. |