[SERVER-65201] De-emphasize errors which are ignored by the ReshardingTest fixture when another error has already occurred Created: 01/Apr/22  Updated: 29/Oct/23  Resolved: 13/Jun/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-73916 Improve ReshardingTest fixture error ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2022-06-27
Participants:
Linked BF Score: 166
Story Points: 2

 Description   

To avoid hanging or crashing the mongo shell process, the ReshardingTest fixture goes through some lengths to interrupt the reshardCollection command on mongos and join the background thread in the mongo shell which was running the reshardCollection command. The error from the background thread is still logged just in case there's a bug in the ReshardingTest fixture and the message happens to be useful. However, the logged error has led to some confusion for what truly caused the test to fail.

We should find a way to elide the self-induced interruption error or separate it more from the assertion failure error message which immediately follows it.



 Comments   
Comment by Githook User [ 13/Jun/22 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-65201 Log compact form of ignored errors in ReshardingTest.

Moves the check for whether the resharding operation run by the
background thread as part of withReshardingInBackground() succeeded or
failed to the main thread. This enables the main thread to selectively
choose how to format the command response when the resharding operation
is intentionally interrupted due to a separate JavaScript error.
Branch: master
https://github.com/mongodb/mongo/commit/1a1408f0baef3356fcf65c38f735a0f13a770f65

Comment by Max Hirschhorn [ 04/Apr/22 ]

After some discussion within the team we would rather not special case the Interrupted error code. Having a "compact error reporting" mode for assert.commandWorked() was proposed by Brett and may be the preferred approach here.

Comment by Max Hirschhorn [ 01/Apr/22 ]

My initial thought was to use tojsononeline() instead of tojson() within the error messages being ignored so they take up less visual space and that the JavaScript error and stacktrace which immediately follows will be understood as the true reason for the failure. However, tojson() is really being called by assert.commandWorked() in the background thread running the reshardCollection command so it won't be possible to condense the output with that approach.

My new thought would be to use a CountDownLatch to signal to the background thread running the reshardCollection command that the main thread has issued a killOp command. And then to have the background thread running the reshardCollection command to use assert.commandWorkedOrFailedWithCode(res, [ErrorCodes.Interrupted]).

Generated at Thu Feb 08 06:02:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.