While the operationTime is returned by the mongod its lost in translation in mongos. There are several passthrough sharding tests that fail so its writeOps but it may affect other command results as well.
The testsuites that are excluding files are:
sharding_jscore_passthrough
sharded_collections_jscore_passthrough
for 3.6. mongos will be executing
Strategy::commandOp and Strategy::writeOp with the new causal clients, hence the responses need to include the latest operationTime received from the mongods.
As the commands in mongos will be processed via ASIO , therefore a random thread will be handling the response, hence we need to augment the txn with decoration that keeps the current operation's time.
Hence the task can be split into 4 parts:
1. Augment the txn with decoration that keeps the current operation's time
class OperationTimeTracker { public: // Decorate OperationContext with OperationTimeTracker instance. static OperationTimeTracker* get(OperationContext* opCtx); LogicalTime getMaxOperationTime() const; // updates maxOperationTime with the max(new, current) void updateOperationTime(LogicalTime); private: // protects _maxOperationTime stdx::mutex _mutex; LogicalTime _maxOperationTime; };
2.
Accepted design:
Subclass TaskExecutor as ShardingTaskExecutor
Forward all commands to the _executor member, and
override the scheduleRemoteCommand as
ShardingTaskExecutor(unique_ptr<ThreadPoolTaskExecutor>) StatusWith<TaskExecutor::CallbackHandle> ShardingTaskExecutor::scheduleRemoteCommand( const RemoteCommandRequest& request, const RemoteCommandCallbackFn& cb) { auto shardingCb = [opCtx = request.opCtx, cb](const executor::TaskExecutor::RemoteCommandCallbackArgs& args) { auto res = cb(args); if (!res.isOK()) { } // extract operationTime from the args.response // updateoperationTime to the decoration OperationTimeTracker::get(opCtx)->updateOperationTime(operationTime); } _executor->scheduleRemoteCommand(request, shardingCb) } private: unique_ptr<ThreadPoolTaskExecutor> _executor;
Create instance of ShardingTaskExecutor in the sharding_initialization.c makeTaskExecutor method.
Alternative design (rejected) :
Refactor EgressMetadataHook to process command reply as well as metadata.
- Add method readCommandReply(OperationContext* txn, const BSONObj& commandReply) to the EgressHook API
- Implement the readCommandReply in ShardingEgressMetadataHookForMongos it should be an invariant(false) on all other hook implementations:
a) parse the operationTime out of the commandReply
b) get the tracker -i.e. OperationTimeTracker::get(txn)
c) call updateOperationTime(operationTime)
3. Access tracker when the results from shards arrive to mongos:
a) In AsyncCommand::response extract OperationContext from op->request() make sure its not null as it potentially can be.
b) Pass the OperationContext to decodeRPC now the decodeRPC has the commandReply and txn. Txn must exists if metadata is attached. (Question to resolve: should it use invariant to enforce, or return error status, or log error and skip processing the commandReply)
c) Call the EgressMetadataHook::readCommandReply method.
4. The commandOp results are finalized in
mongo::execCommandClient
, hence in that function the current value of operationTime stored in the decoration should be used in the reply to client.
auto commandOperationTime = OperationTimeTracker::get(txn)->getMaxOperationTime(); Command::appendCommandStatus(result, commandOperationTime);