[SERVER-38938] Include useful metadata whenever we log a command response in the shell Created: 10/Jan/19  Updated: 29/Oct/23  Resolved: 19/Jan/21

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: 4.3 Desired, 4.9.0

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Robert Guo (Inactive)
Resolution: Fixed Votes: 1
Labels: quick-win, tig-assertjs, tig-qwin-eligible
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Sprint: STM 2019-10-03, STM 2020-01-23, STM 2020-11-02, STM 2020-11-30, STM 2020-12-14, STM 2021-01-11, STM 2021-01-25
Participants:
Story Points: 2

 Description   

It would be very helpful if the command request, the node the request was made to, and the connection it was made on were logged whenever we log a command response. This is ideally both when a command failure causes a test to fail, but also when that response is caught, logged, and retried.



 Comments   
Comment by Githook User [ 19/Jan/21 ]

Author:

{'name': 'Robert Guo', 'email': 'robert.guo@mongodb.com'}

Message: SERVER-38938 include metadata in command response
Branch: master
https://github.com/mongodb/mongo/commit/f3b3164a196fa930b17ac2f703db433d8fc79f36

Comment by Max Hirschhorn [ 16/Oct/20 ]

robert.guo, judah.schvimer, one approach I had imagined for this ticket would be to stash the original command request in a hidden property on the command response object similar to what we do with the "_mongo" connection object. The assertion functions would then inspect the command response object for this property and log it.

diff --git a/src/mongo/scripting/mozjs/internedstring.defs b/src/mongo/scripting/mozjs/internedstring.defs
index fe3d5e6aa4..2fc2a8e5ef 100644
--- a/src/mongo/scripting/mozjs/internedstring.defs
+++ b/src/mongo/scripting/mozjs/internedstring.defs
@@ -12,6 +12,7 @@ MONGO_MOZJS_INTERNED_STRING(_bson, "_bson")
 MONGO_MOZJS_INTERNED_STRING(code, "code")
 MONGO_MOZJS_INTERNED_STRING(_collection, "_collection")
 MONGO_MOZJS_INTERNED_STRING(columnNumber, "columnNumber")
+MONGO_MOZJS_INTERNED_STRING(_commandObj, "_commandObj")
 MONGO_MOZJS_INTERNED_STRING(constructor, "constructor")
 MONGO_MOZJS_INTERNED_STRING(_cursor, "_cursor")
 MONGO_MOZJS_INTERNED_STRING(database, "database")
diff --git a/src/mongo/scripting/mozjs/mongo.cpp b/src/mongo/scripting/mozjs/mongo.cpp
index 6849e63639..792f190140 100644
--- a/src/mongo/scripting/mozjs/mongo.cpp
+++ b/src/mongo/scripting/mozjs/mongo.cpp
@@ -284,6 +284,12 @@ void MongoBase::Functions::runCommand::call(JSContext* cx, JS::CallArgs args) {
     // Also, we make a copy here because we want a copy after we dump cmdRes
     ValueReader(cx, args.rval()).fromBSON(cmdRes.getOwned(), nullptr, false /* read only */);
     setHiddenMongo(cx, std::get<1>(resTuple), conn.get(), args);
+
+    ObjectWrapper o(cx, args.rval());
+    if (!o.hasField(InternedString::_commandObj)) {
+        o.defineProperty(
+            InternedString::_commandObj, args.get(1), JSPROP_READONLY | JSPROP_PERMANENT);
+    }
 }
 
 void MongoBase::Functions::runCommandWithMetadata::call(JSContext* cx, JS::CallArgs args) {
diff --git a/src/mongo/shell/assert.js b/src/mongo/shell/assert.js
index 54f5403b53..e25e3ecf43 100644
--- a/src/mongo/shell/assert.js
+++ b/src/mongo/shell/assert.js
@@ -688,7 +688,11 @@ assert = (function() {
 
         // Keep this as a function so we don't call tojson if not necessary.
         const makeFailMsg = () => {
-            return _buildAssertionMessage(msg, "command failed: " + tojson(res));
+            let prefix = "command failed: " + tojson(res);
+            if (typeof res._commandObj === "object" && res._commandObj !== null) {
+                prefix += " with original command request: " + tojson(res._commandObj);
+            }
+            return _buildAssertionMessage(msg, prefix);
         };
 
         if (_isWriteResultType(res)) {

Comment by Judah Schvimer [ 16/Oct/20 ]

I worry about logging every command in a test. Will that explode the size of the logs, especially for commands that insert large documents, or tests that do thousands of inserts/updates in a loop? Part of the desire of this ticket also is non-FSM tests that have changing topologies, and so the exact mongod, mongos, and/or session that is being connected to for any given command may not be clear.

Comment by Robert Guo (Inactive) [ 16/Oct/20 ]

I think there's more complexity involved here than I originally imagined, in particular, there's a many-to-many relationship between commands and shell assertions; this makes it non-trivial to build a generic tool, without modifying the assertion call itself, to track which command should be printed. FSM workloads add in another wrinkle where we don't trigger all assertions in all workloads, since any test can run under different FSM modes.

For this ticket, I'm inclined to just modify the assertions for the FSM tests to explicitly capture the metadata, of which there are 78.

For arbitrary assertions, I think it might be easier to add a logging mode to the Server to log all commands. It should be easier than modifying all assert.* calls in all JS tests. And since most JS tests don't have any parallelism, it should be easier to correlate the response to with the request. I'd be happy to file a ticket for this feature.

Comment by Brooke Miller [ 22/Sep/20 ]

We discussed as a team in triaging that we need to do additional investigation on this before we can determine what all needs to be done here. We're putting this back into the Tracking Sprint to do the investigation. Once we have more insight into this, then we'll estimate the complexity as a team and schedule it into a sprint.

Comment by Robert Guo (Inactive) [ 09/Jul/20 ]

judah.schvimer It hasn't been on the radar lately. We're going to try to prioritize it soon after the release.

Comment by Judah Schvimer [ 01/Jul/20 ]

What's the timeline on this? I would have benefited from this again significantly when a patch failure was reading from a secondary, when I was unaware. As we have more and more suites, it's hard to know what each suite is doing, even when reading the yaml file.

Comment by Ian Whalen (Inactive) [ 20/Feb/20 ]

Yup I can put this in the following sprint, but can't guarantee it won't still get bumped. Have to rely on priority decision from Robert on that.

Comment by Judah Schvimer [ 20/Feb/20 ]

ian.whalen, could this be next sprint? I'm worried about this getting lost in the backlog.

Comment by Ian Whalen (Inactive) [ 20/Feb/20 ]

Sorry, we thought there would be more time to do this in this sprint. Bumping back to backlog for now pending free time in the future.

Comment by Daniel Gottlieb (Inactive) [ 15/Nov/19 ]

I don't think so. The most recent patch failure I'm chasing is an FSM failure that includes the "tid" that had an error. In this case the logs do include:

[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:55.352+0000 [tid:7, conn:conn1150] setting random seed: 974053906

But that seems fragile and can be hard to find if one doesn't know to look for the exact string tid:7. My understanding of this ticket is that all the information needed to map an error to a node + connection would be presented alongside the (FSM) error report:

[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000 2019-11-14T22:16:57.553+0000 E  QUERY    [js] uncaught exception: Error: 1 thread with tids [7] threw
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000 
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         Foreground jstests/concurrency/fsm_workloads/findAndModify_update_queue_unindexed.js
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         Error: write failed with error: {
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         	"nMatched" : 0,
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         	"nUpserted" : 0,
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         	"nModified" : 0,
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         	"writeError" : {
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         		"code" : 48,
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         		"errmsg" : "collection already exists test41_fsmdb0findAndModify_update_queue_unindexed.fsmcoll0"
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         	}
[fsm_workload_test:findAndModify_update_queue_unindexed] 2019-11-14T22:16:57.553+0000         }

Comment by Judah Schvimer [ 15/Nov/19 ]

daniel.gottlieb, is there any specific metadata you're interested in or that I missed in the description and above comment?

Comment by Daniel Gottlieb (Inactive) [ 15/Nov/19 ]

Is there something I can do to help this ticket get pushed through? It would be very beneficial, particularly in cases where there's difficulty reproducing failures locally.

Comment by Judah Schvimer [ 17/May/19 ]

Another piece of metadata that would be useful, which may be implicit in the command request, is the session the command was run on, and the transaction number if it was part of a transaction.

Generated at Thu Feb 08 04:50:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.