[SERVER-38938] Include useful metadata whenever we log a command response in the shell Created: 10/Jan/19 Updated: 29/Oct/23 Resolved: 19/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Testing Infrastructure |
| Affects Version/s: | None |
| Fix Version/s: | 4.3 Desired, 4.9.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Robert Guo (Inactive) |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | quick-win, tig-assertjs, tig-qwin-eligible | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | STM 2019-10-03, STM 2020-01-23, STM 2020-11-02, STM 2020-11-30, STM 2020-12-14, STM 2021-01-11, STM 2021-01-25 |
| Participants: | |
| Story Points: | 2 |
| Description |
|
It would be very helpful if the command request, the node the request was made to, and the connection it was made on were logged whenever we log a command response. This is ideally both when a command failure causes a test to fail, but also when that response is caught, logged, and retried. |
| Comments |
| Comment by Githook User [ 19/Jan/21 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Robert Guo', 'email': 'robert.guo@mongodb.com'}Message: | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Max Hirschhorn [ 16/Oct/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
robert.guo, judah.schvimer, one approach I had imagined for this ticket would be to stash the original command request in a hidden property on the command response object similar to what we do with the "_mongo" connection object. The assertion functions would then inspect the command response object for this property and log it.
| ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 16/Oct/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
I worry about logging every command in a test. Will that explode the size of the logs, especially for commands that insert large documents, or tests that do thousands of inserts/updates in a loop? Part of the desire of this ticket also is non-FSM tests that have changing topologies, and so the exact mongod, mongos, and/or session that is being connected to for any given command may not be clear. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 16/Oct/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
I think there's more complexity involved here than I originally imagined, in particular, there's a many-to-many relationship between commands and shell assertions; this makes it non-trivial to build a generic tool, without modifying the assertion call itself, to track which command should be printed. FSM workloads add in another wrinkle where we don't trigger all assertions in all workloads, since any test can run under different FSM modes. For this ticket, I'm inclined to just modify the assertions for the FSM tests to explicitly capture the metadata, of which there are 78. For arbitrary assertions, I think it might be easier to add a logging mode to the Server to log all commands. It should be easier than modifying all assert.* calls in all JS tests. And since most JS tests don't have any parallelism, it should be easier to correlate the response to with the request. I'd be happy to file a ticket for this feature. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Brooke Miller [ 22/Sep/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
We discussed as a team in triaging that we need to do additional investigation on this before we can determine what all needs to be done here. We're putting this back into the Tracking Sprint to do the investigation. Once we have more insight into this, then we'll estimate the complexity as a team and schedule it into a sprint. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Robert Guo (Inactive) [ 09/Jul/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
judah.schvimer It hasn't been on the radar lately. We're going to try to prioritize it soon after the release. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 01/Jul/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
What's the timeline on this? I would have benefited from this again significantly when a patch failure was reading from a secondary, when I was unaware. As we have more and more suites, it's hard to know what each suite is doing, even when reading the yaml file. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 20/Feb/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Yup I can put this in the following sprint, but can't guarantee it won't still get bumped. Have to rely on priority decision from Robert on that. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 20/Feb/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
ian.whalen, could this be next sprint? I'm worried about this getting lost in the backlog. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 20/Feb/20 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Sorry, we thought there would be more time to do this in this sprint. Bumping back to backlog for now pending free time in the future. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 15/Nov/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
I don't think so. The most recent patch failure I'm chasing is an FSM failure that includes the "tid" that had an error. In this case the logs do include:
But that seems fragile and can be hard to find if one doesn't know to look for the exact string tid:7. My understanding of this ticket is that all the information needed to map an error to a node + connection would be presented alongside the (FSM) error report:
| ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 15/Nov/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
daniel.gottlieb, is there any specific metadata you're interested in or that I missed in the description and above comment? | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Daniel Gottlieb (Inactive) [ 15/Nov/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Is there something I can do to help this ticket get pushed through? It would be very beneficial, particularly in cases where there's difficulty reproducing failures locally. | ||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Judah Schvimer [ 17/May/19 ] | ||||||||||||||||||||||||||||||||||||||||||||||
|
Another piece of metadata that would be useful, which may be implicit in the command request, is the session the command was run on, and the transaction number if it was part of a transaction. |