[SERVER-28318] make sharded getLastError work with paths that go over ASIO, or eliminate it Created: 15/Mar/17 Updated: 06/Dec/17 Resolved: 06/Apr/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.5.4 |
| Fix Version/s: | 3.5.6 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Esha Maharishi (Inactive) |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Sharding 2017-04-17 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
As we replace uses of DBClient with tasks scheduled over the TaskExecutor/ASIO, any paths on mongos that used DBClient to save the lastError info through the ShardingEgressMetadataHook will no longer function. This is because the hook checks haveClient(), and the ASIO threads do not have Clients. (The original thread that scheduled tasks on behalf of a request to mongos had the Client). It is not easy to pass the Client for the original thread to the hook, since the Client is obtained from the OperationContext, and it is unsafe to use the OperationContext from a multi-threaded context. This may possibly be addressed by work done in the near future on using the OperationContext from tasks on the TaskExecutor by schwerin and redbeard0531. |
| Comments |
| Comment by Githook User [ 06/Apr/17 ] | ||||||||||||||||||||||||||||||||||||
|
Author: {u'username': u'EshaMaharishi', u'name': u'Esha Maharishi', u'email': u'esha.maharishi@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||
| Comment by Andrew Morrow (Inactive) [ 20/Mar/17 ] | ||||||||||||||||||||||||||||||||||||
|
I'd think that would also let us drop support for pseudo-commands in the server: https://jira.mongodb.org/browse/SERVER-17981 | ||||||||||||||||||||||||||||||||||||
| Comment by A. Jesse Jiryu Davis [ 20/Mar/17 ] | ||||||||||||||||||||||||||||||||||||
|
Removing getLastError would require bumping the server's reported minimum wire version for the first time. It would also let us drop OP_INSERT, OP_UPDATE, OP_DELETE. The drivers team proposes you leave GLE in the server through version 3.6, deprecated, then drop it and bump the min wire version in 3.8. The Server Discovery and Monitoring Spec has had, for several years, rules for drivers checking that they support the server's minimum wire version. So far that version has always been 0, but we're future-proof. | ||||||||||||||||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 15/Mar/17 ] | ||||||||||||||||||||||||||||||||||||
|
In the case we can't eliminate getLastError, we agreed to do it this way, with the understanding that a mutex will have to be added around accessing the ClusterLastErrorInfo, since this callback can still be run from multiple threads. | ||||||||||||||||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 15/Mar/17 ] | ||||||||||||||||||||||||||||||||||||
|
kaloian.manassiev Hmm. So:
I don't totally get why it's safe to use the OperationContext here (isn't this still running in an ASIO thread?), but I imagine we would make the change like this: before
after
| ||||||||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 15/Mar/17 ] | ||||||||||||||||||||||||||||||||||||
|
I don't think we can get rid of getLastError as long as we support fire-and-forget writes. When OP_MSG becomes available, we might be able to deprecate it. That being said - this is exactly the same problem as aggregating the latest clusterTime, which misha.tyulenev is solving in | ||||||||||||||||||||||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 15/Mar/17 ] | ||||||||||||||||||||||||||||||||||||
|
It should be noted that the "multi-threaded"ness of the hooks is not just sharing the OperationContext/Client between the thread running the hook and the original thread, but possibly multiple threads running the hook. This is because the original thread may have scheduled multiple requests to shards in parallel to satisfy the client's request (e.g., the write path). |