[SERVER-59686] Investigate failing parallel shell in jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js Created: 31/Aug/21  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Richard Samuels (Inactive) Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt Dependency
has to be done after SERVER-58200 Asserting clauses do not cause a jste... Closed
Assigned Teams:
Cluster Scalability
Operating System: ALL
Steps To Reproduce:

Remove checkExitSuccess: false from jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js:159 and run it.

Participants:

 Description   

SERVER-58200 modified startParallelShell to ensure that whenever a parallel shell is started, the cleanup function returned by startParallelShell must be called in that shell, or the whole test will fail.

As a part of this ticket, we modified the jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js
to run the function, which checks the exit code of the parallel shell, and throws an error if it's not 0.

It began to fail with the following assertion:

[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] uncaught exception: Error: [0] != [252] are not equal : encountered an error in the parallel shell :
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] doassert@src/mongo/shell/assert.js:20:14
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] assert.eq@src/mongo/shell/assert.js:179:9
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] startParallelShell/<@src/mongo/shell/servers_misc.js:182:13
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] @jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js:157:1
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] @jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js:18:2

The parallel shell is launched in jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js:157. (Off commit e502f2d3965ac4147d303e956a582b7c4eef8232) Here's the whole stack trace of when the parallel shell is launched:

[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] startParallelShell/<@src/mongo/shell/servers_misc.js:182:13
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] @jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js:157:1
[js_test:txn_two_phase_commit_coordinator_shutdown_and_restart] @jstests/sharding/txn_two_phase_commit_coordinator_shutdown_and_restart.js:18:2

STM has set the checkExitSuccess flag to false on the cleanup function of the parallel shell to prevent the error from causing these tests to go red and to preserve existing semantics. We'd like someone to investigate if the parallel shell failure is expected (in which case checkExitSuccess should remain false), or if it's unexpected and the test needs to be modified. 


Generated at Thu Feb 08 05:47:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.