[SERVER-28318] make sharded getLastError work with paths that go over ASIO, or eliminate it Created: 15/Mar/17  Updated: 06/Dec/17  Resolved: 06/Apr/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.4
Fix Version/s: 3.5.6

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-28591 rename ShardingTaskExecutor to ShardT... Closed
depends on DRIVERS-360 Could the server remove support for g... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-04-17
Participants:

 Description   

As we replace uses of DBClient with tasks scheduled over the TaskExecutor/ASIO, any paths on mongos that used DBClient to save the lastError info through the ShardingEgressMetadataHook will no longer function.

This is because the hook checks haveClient(), and the ASIO threads do not have Clients. (The original thread that scheduled tasks on behalf of a request to mongos had the Client).

It is not easy to pass the Client for the original thread to the hook, since the Client is obtained from the OperationContext, and it is unsafe to use the OperationContext from a multi-threaded context.

This may possibly be addressed by work done in the near future on using the OperationContext from tasks on the TaskExecutor by schwerin and redbeard0531.



 Comments   
Comment by Githook User [ 06/Apr/17 ]

Author:

{u'username': u'EshaMaharishi', u'name': u'Esha Maharishi', u'email': u'esha.maharishi@mongodb.com'}

Message: SERVER-28318 make sharded getLastError work with paths that go over ASIO
Branch: master
https://github.com/mongodb/mongo/commit/366b39a143ec17a65e8ccc08f1883e4c92dd669e

Comment by Andrew Morrow (Inactive) [ 20/Mar/17 ]

I'd think that would also let us drop support for pseudo-commands in the server: https://jira.mongodb.org/browse/SERVER-17981

Comment by A. Jesse Jiryu Davis [ 20/Mar/17 ]

Removing getLastError would require bumping the server's reported minimum wire version for the first time. It would also let us drop OP_INSERT, OP_UPDATE, OP_DELETE. The drivers team proposes you leave GLE in the server through version 3.6, deprecated, then drop it and bump the min wire version in 3.8.

The Server Discovery and Monitoring Spec has had, for several years, rules for drivers checking that they support the server's minimum wire version. So far that version has always been 0, but we're future-proof.

Comment by Esha Maharishi (Inactive) [ 15/Mar/17 ]

In the case we can't eliminate getLastError, we agreed to do it this way, with the understanding that a mutex will have to be added around accessing the ClusterLastErrorInfo, since this callback can still be run from multiple threads.

Comment by Esha Maharishi (Inactive) [ 15/Mar/17 ]

kaloian.manassiev Hmm. So:

  • instead of calling cc(), the Client will be obtained from the captured OperationContext,
  • the metadata will be obtained from the RemoteCommandResponse, and
  • the hostString will be obtained from the RemoteCommandRequest?

I don't totally get why it's safe to use the OperationContext here (isn't this still running in an ASIO thread?), but I imagine we would make the change like this:

before

void ShardingEgressMetadataHookForMongos::_saveGLEStats(const BSONObj& metadata,
                                                        StringData hostString) {
    // parse the metadata
    auto swShardingMetadata = rpc::ShardingMetadata::readFromMetadata(metadata);
    ...
    auto shardingMetadata = std::move(swShardingMetadata.getValue());
 
    // parse the hostString
    auto shardConn = ConnectionString::parse(hostString.toString());
    ...
 
    // update the lastError info with the metadata and hostString
    auto& clientInfo = cc();
    ClusterLastErrorInfo::get(clientInfo)
        .addHostOpTime(
            shardConn.getValue(),
            HostOpTime(shardingMetadata.getLastOpTime(), shardingMetadata.getLastElectionId()));

after

StatusWith<TaskExecutor::CallbackHandle> ShardingTaskExecutor::scheduleRemoteCommand(
    const RemoteCommandRequest& request, const RemoteCommandCallbackFn& cb) {
    auto shardingCb = [opCtx = request.opCtx, cb](const executor::TaskExecutor::RemoteCommandCallbackArgs& args)  {
        // parse the metadata
        auto swShardingMetadata = rpc::ShardingMetadata::readFromMetadata(args.response.metadata);
 
        // parse the hostString
        auto shardConn = ConnectionString::parse(args.request.target);
        ...
 
        // update the lastError info with the metadata and hostString
        auto clientInfo = opCtx->getClient();
        ClusterLastErrorInfo::get(clientInfo)
            .addHostOpTime(
                shardConn.getValue(),
                HostOpTime(shardingMetadata.getLastOpTime(), shardingMetadata.getLastElectionId()));
    }
    _executor->scheduleRemoteCommand(request, shardingCb) 
}

Comment by Kaloian Manassiev [ 15/Mar/17 ]

I don't think we can get rid of getLastError as long as we support fire-and-forget writes. When OP_MSG becomes available, we might be able to deprecate it.

That being said - this is exactly the same problem as aggregating the latest clusterTime, which misha.tyulenev is solving in SERVER-28107. Can we use the same solution and maybe even hook at the same place in the TaskExecutor?

Comment by Esha Maharishi (Inactive) [ 15/Mar/17 ]

It should be noted that the "multi-threaded"ness of the hooks is not just sharing the OperationContext/Client between the thread running the hook and the original thread, but possibly multiple threads running the hook.

This is because the original thread may have scheduled multiple requests to shards in parallel to satisfy the client's request (e.g., the write path).

Generated at Thu Feb 08 04:17:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.