[SERVER-46304] txn_two_phase_commit_coordinator_shutdown_and_restart.js should force coordinator nodes to shut down even if there are no electable secondaries Created: 21/Feb/20  Updated: 28/Feb/20  Resolved: 28/Feb/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Tess Avitabile (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-45009 Transaction coordinator tasks should ... Closed
Operating System: ALL
Sprint: Repl 2020-03-09
Participants:
Linked BF Score: 22

 Description   

Currently, the test uses ReplSetTest.stopSet, but that can fail to shut down the primary if there are no electable secondaries:

d20020| 2020-02-14T21:12:06.608+0000 I  STORAGE  [SignalHandler] Failed to stepDown in non-command initiated shutdown path ExceededTimeLimit: No electable secondaries caught up as of 2020-02-14T21:12:06.608+0000. Please use the replSetStepDown command with the argument {force: true} to force node to step down.

This causes the test to hang at that line, because the shell is waiting for the primary to be shutdown as part of waitpid:

Thread 4: "js" (Thread 0x7fac52de0700 (LWP 10571))                                                  
#0  0x00007fac597e9f7b in waitpid () from /lib/x86_64-linux-gnu/libpthread.so.0                     
#1  0x0000564b096c92c7 in mongo::shell_utils::wait_for_pid (pid=..., block=<optimized out>, exit_code=0x7fac52dde8d8) at src/mongo/platform/process_id.h:74
#2  0x0000564b096c98b0 in mongo::shell_utils::WaitMongoProgram (a=owned BSONObj 16 bytes @ 0x7fac4c752d68, data=data@entry=0x0) at src/mongo/shell/shell_utils_launcher.cpp:804
#3  0x0000564b0989c8ce in mongo::mozjs::NativeFunctionInfo::call (cx=cx@entry=0x7fac5407b020, args=...) at src/third_party/mozjs-60/include/js/RootingAPI.h:1128
#4  0x0000564b0988b12b in mongo::mozjs::smUtils::call<mongo::mozjs::NativeFunctionInfo> (cx=0x7fac5407b020, argc=1, vp=<optimized out>) at src/third_party/mozjs-60/include/js/CallArgs.h:317
<snipped>



 Comments   
Comment by Esha Maharishi (Inactive) [ 28/Feb/20 ]

Ah yes, I forgot about that. SERVER-45009 will make it so the coordinator does finish shutting down. Unless the replication team would like to do a quick fix for the test separately from SERVER-45009, feel free to close this as a dupe.

Comment by Tess Avitabile (Inactive) [ 28/Feb/20 ]

In the case of non-command initiated shutdown, if the stepdown fails, we log the exception and proceed with the shutdown as a primary.

It looks like the shutdown hung here while shutting down the sharding TaskExecutor pool:

#6  0x000055680b2d6ed5 in mongo::executor::ThreadPoolTaskExecutor::join (this=0x7f60950ca360) at /opt/mongodbtoolchain/revisions/23805b43dd027076b4ae48533ab385e20c61e0cf/stow/gcc-v3.2WP/include/c++/8.2.0/bits/std_mutex.h:259
#7  0x000055680a4e28ef in mongo::executor::TaskExecutorPool::shutdownAndJoin (this=0x7f609520bec0) at /opt/mongodbtoolchain/revisions/23805b43dd027076b4ae48533ab385e20c61e0cf/stow/gcc-v3.2WP/include/c++/8.2.0/bits/shared_ptr_base.h:996
#8  0x000055680a20667a in mongo::ShardingInitializationMongoD::shutDown (this=0x7f609c8bdfe8, opCtx=0x7f609760d1a0) at /opt/mongodbtoolchain/revisions/23805b43dd027076b4ae48533ab385e20c61e0cf/stow/gcc-v3.2WP/include/c++/8.2.0/bits/unique_ptr.h:342
#9  0x0000556809c9ad4e in mongo::(anonymous namespace)::shutdownTask (shutdownArgs=...) at src/mongo/db/db.cpp:1089

Generated at Thu Feb 08 05:11:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.