[SERVER-29700] findAndModify returns operationTime before the modification took place Created: 16/Jun/17  Updated: 08/Jan/24  Resolved: 19/Jul/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.9
Fix Version/s: 3.5.11

Type: Bug Priority: Major - P3
Reporter: Misha Tyulenev Assignee: Misha Tyulenev
Resolution: Fixed Votes: 0
Labels: PM-221
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-29455 Mongos egress operations over DBClien... Closed
Related
is related to SERVER-29629 Add causally consistent passthrough w... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-07-10, Sharding 2017-07-31
Participants:

 Description   

Filed to investigate why, and if its expected update the docs to warn users that follow up reads will not be causally consistent with the findAndModify unless it uses w:majority.
the issue logs:
https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_rhel_62_64_bit_causally_consistent_jscore_passthrough_auth_WT_c7aa40633696a49fea2e75750b30a8339117d072_17_06_16_16_23_00

After the root cause is determined here is the proposal to address the issue:

1. Add OperationContext argument to interface EgressMetadataHook::readReplyMetadata

virtual Status readReplyMetadata(OperationContext* opCtx, StringData replySource, const BSONObj& metadataObj) = 0;

2. In ShardingConnectionHook::onCreate add the opCtx for replyMetadataReader:

    if (_shardedConnections) {
        conn->setReplyMetadataReader([this](OperationContext* opCtx, const BSONObj& metadataObj, StringData target) {
            return _egressHook->readReplyMetadata(opCtx, target, metadataObj);
        });
    }

3. in DBClientWithCommands::runCommandWithTarget pass the operationContext to the _metadataReader call

Reading OperationTime
4. add OperationTimeHook class

Status OperationTimeHook::readReplyMetadata(OperationContext* opCtx, StringData replySource,
                                                  const BSONObj& metadataObj) {
        std::shared_ptr<OperationTimeTracker> timeTracker = OperationTimeTracker::get(opCtx);
 
       auto operationTime =metadataObj[kOperationTimeField];
        if (!operationTime.eoo()) {
            invariant(operationTime.type() == BSONType::bsonTimestamp);
            timeTracker->updateOperationTime(LogicalTime(operationTime.timestamp()));
        }
 
}

5. install OperationTimeHook to the hook list in runMongosServer

 unshardedHookList->addHook(stdx::make_unique<rpc::OperationTimeHook>());
 shardedHookList->addHook(stdx::make_unique<rpc::OperationTimeHook>());



 Comments   
Comment by Githook User [ 19/Jul/17 ]

Author:

{u'username': u'mikety', u'name': u'Misha Tyulenev', u'email': u'misha@mongodb.com'}

Message: SERVER-29700 add operationTime in all connections used by mongos
Branch: master
https://github.com/mongodb/mongo/commit/e034bdc4be7de0326d34e982febf00505cf0d993

Comment by Spencer Brody (Inactive) [ 27/Jun/17 ]

Looks like the logic for tracking the operation time is baked in the ShardingTaskExecutor. That same logic likely needs to be duplicated or shared with the sharding_connection_hook and sharding_network_connection_hook.

Comment by Spencer Brody (Inactive) [ 27/Jun/17 ]

Spend some time looking at this on Friday, and I believe the last op on the client is being set correctly. Spoke with Misha this morning and we think the issue is that mongos only tracks operation times for requests sent through the sharding task executor, while findAndModify uses a ShardConnection to talk to the shard. We will need to update mongos to track operation times sent/received from any of the various connection pools.

Comment by Misha Tyulenev [ 19/Jun/17 ]

The operationTime returned in this case is computed by computeOperationTime https://github.com/mongodb/mongo/blob/master/src/mongo/db/service_entry_point_mongod.cpp#L345-L371, which calls getClientOperatinTime https://github.com/mongodb/mongo/blob/master/src/mongo/db/service_entry_point_mongod.cpp#L345-L371 which returns

       operationTime = LogicalTime(
            repl::ReplClientInfo::forClient(opCtx->getClient()).getLastOp().getTimestamp());

findAndModify can not have readConcern so its implicitly created with level=local. (https://github.com/mongodb/mongo/blob/r3.5.8/src/mongo/db/repl/read_concern_args.cpp#L84-L86)

So in this case runCommandImpl (https://github.com/mongodb/mongo/blob/master/src/mongo/db/service_entry_point_mongod.cpp#L373) will use either startOperationTime that is initialized to getClientOperationTime before the command started or with getClientOperationTime after the command finished whatever is greater.
if the value returned by getClientOperationTime did not change it will use

LogicalTime(replCoord->getMyLastAppliedOpTime().getTimestamp());

So apparently either the getClient()).getLastOp() is not properly updated when findAndModify is run or the getMyLastAppliedOpTime() is not properly computed the latter can be related to SERVER-29693
spencer wdyt?

Comment by Max Hirschhorn [ 19/Jun/17 ]

I applied the following patch to add some tracing for the operationTime and logicalTime being returned by the sharded cluster. The operationTime of Timestamp(1497894743, 3) for the "findAndModify" command response is from before the operationTime of Timestamp(1497894743, 5) for the "insert" command that the server responded with immediately prior.

diff --git a/src/mongo/shell/mongo.js b/src/mongo/shell/mongo.js
index 471a0e3..115b080 100644
--- a/src/mongo/shell/mongo.js
+++ b/src/mongo/shell/mongo.js
@@ -21,6 +21,7 @@ if (!Mongo.prototype) {
         const origNext = res.next;
         res.next = function next() {
             const ret = origNext.call(this);
+            print("!!! In next(): " + tojson(ret));
             self._setLogicalTimeFromReply(ret);
             return ret;
         };
@@ -202,6 +203,7 @@ Mongo.prototype._setLogicalTimeFromReply = function(res) {
             cmdObj = this._gossipLogicalTime(cmdObj);
         }
         const res = original.call(this, dbName, cmdObj, options);
+        print("!!! Response from " + tojson(cmdObj) + ": " + tojson(res));
         this._setLogicalTimeFromReply(res);
         return res;
     };

$ python buildscripts/resmoke.py --executor=causally_consistent_jscore_passthrough_auth jstests/core/find_and_modify_server6993.js --repeat=10
...
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.350-0400 Starting JSTest jstests/core/find_and_modify_server6993.js under executor causally_consistent_jscore_passthrough_auth...
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.362-0400 JSTest jstests/core/find_and_modify_server6993.js started with pid 663.
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.372-0400 MongoDB shell version v3.5.8-183-gb13164e
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.419-0400 connecting to: mongodb://127.0.0.1:20005/
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.425-0400 MongoDB server version: 3.5.8-183-gb13164e
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.456-0400 Authenticating as user __system with mechanism SCRAM-SHA-1 on connection: connection to 127.0.0.1:20005
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.504-0400 true
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.515-0400 !!! Response from {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.516-0400 	"drop" : "find_and_modify_server6993",
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.516-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.516-0400 		"clusterTime" : Timestamp(1497894742, 12),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.516-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.517-0400 			"hash" : BinData(0,"DlgOVPgh/EsGtmYpuyAjJXqbOL8="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.517-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.517-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.518-0400 	}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.518-0400 }: {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.518-0400 	"ok" : 1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.518-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.519-0400 		"clusterTime" : Timestamp(1497894743, 3),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.519-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.519-0400 			"hash" : BinData(0,"/4iUJL9Nzw5hdik7nhEb/+aDhNI="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.519-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.520-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.520-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.520-0400 	"operationTime" : Timestamp(1497894743, 3)
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.520-0400 }
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.527-0400 !!! In next(): {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.527-0400 	"ok" : 1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.528-0400 	"n" : 1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.528-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.528-0400 		"clusterTime" : Timestamp(1497894743, 5),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.528-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.528-0400 			"hash" : BinData(0,"/4iUJL9Nzw5hdik7nhEb/+aDhNI="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.529-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.529-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.529-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.529-0400 	"operationTime" : Timestamp(1497894743, 5)
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.530-0400 }
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.530-0400 !!! Response from {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.530-0400 	"findandmodify" : "find_and_modify_server6993",
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.531-0400 	"query" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.531-0400 		"a" : 1
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.531-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.531-0400 	"update" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.531-0400 		"$set" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.532-0400 			"a.$" : 5
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.532-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.532-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.532-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.532-0400 		"clusterTime" : Timestamp(1497894743, 5),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.533-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.533-0400 			"hash" : BinData(0,"/4iUJL9Nzw5hdik7nhEb/+aDhNI="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.533-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.533-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.533-0400 	}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.534-0400 }: {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.534-0400 	"lastErrorObject" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.534-0400 		"updatedExisting" : true,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.534-0400 		"n" : 1
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.534-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.534-0400 	"value" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.535-0400 		"_id" : ObjectId("59480f5702a091c2eb009da9"),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.535-0400 		"a" : [
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.535-0400 			1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.535-0400 			2
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.535-0400 		]
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.536-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.536-0400 	"ok" : 1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.536-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.536-0400 		"clusterTime" : Timestamp(1497894743, 6),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.536-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.536-0400 			"hash" : BinData(0,"/4iUJL9Nzw5hdik7nhEb/+aDhNI="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.537-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.537-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.537-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.537-0400 	"operationTime" : Timestamp(1497894743, 3)
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.537-0400 }
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.539-0400 !!! Response from {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.539-0400 	"find" : "find_and_modify_server6993",
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.540-0400 	"filter" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.540-0400
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.540-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.541-0400 	"limit" : 1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.541-0400 	"singleBatch" : true,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.541-0400 	"readConcern" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.542-0400 		"afterClusterTime" : Timestamp(1497894743, 5),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.542-0400 		"level" : "majority"
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.542-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.543-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.543-0400 		"clusterTime" : Timestamp(1497894743, 6),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.543-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.544-0400 			"hash" : BinData(0,"/4iUJL9Nzw5hdik7nhEb/+aDhNI="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.544-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.544-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.544-0400 	}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.545-0400 }: {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.545-0400 	"cursor" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.545-0400 		"firstBatch" : [
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.546-0400 			{
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.546-0400 				"_id" : ObjectId("59480f5702a091c2eb009da9"),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.546-0400 				"a" : [
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.546-0400 					1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.547-0400 					2
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.547-0400 				]
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.547-0400 			}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.548-0400 		],
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.548-0400 		"id" : NumberLong(0),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.548-0400 		"ns" : "test.find_and_modify_server6993"
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.548-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.549-0400 	"ok" : 1,
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.549-0400 	"$logicalTime" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.549-0400 		"clusterTime" : Timestamp(1497894743, 6),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.550-0400 		"signature" : {
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.550-0400 			"hash" : BinData(0,"/4iUJL9Nzw5hdik7nhEb/+aDhNI="),
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.550-0400 			"keyId" : NumberLong("6433408861020880916")
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.551-0400 		}
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.551-0400 	},
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.551-0400 	"operationTime" : Timestamp(1497894743, 5)
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.551-0400 }
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.551-0400 assert: [5] != [1] are not equal : undefined
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.552-0400 doassert@src/mongo/shell/assert.js:18:14
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.552-0400 assert.eq@src/mongo/shell/assert.js:54:5
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.552-0400 @jstests/core/find_and_modify_server6993.js:13:1
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.552-0400
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.553-0400 2017-06-19T13:52:23.540-0400 E QUERY    [thread1] Error: [5] != [1] are not equal : undefined :
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.553-0400 doassert@src/mongo/shell/assert.js:18:14
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.554-0400 assert.eq@src/mongo/shell/assert.js:54:5
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.554-0400 @jstests/core/find_and_modify_server6993.js:13:1
[js_test:find_and_modify_server6993] 2017-06-19T13:52:23.554-0400 failed to load: jstests/core/find_and_modify_server6993.js

Comment by Andy Schwerin [ 17/Jun/17 ]

Can you provide more detail on the behavior you're seeing?

Generated at Thu Feb 08 04:21:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.