[SERVER-20959] drain.js hangs in shutdown command in Enterprise Windows under replicasets_WT_ese Created: 02/Oct/15  Updated: 19/Nov/15  Resolved: 19/Nov/15

Status: Closed
Project: Core Server
Component/s: Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: J Rassi Assignee: Mark Benvenuto
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-20755 Config server hang during SIGTERM-ini... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Platform C (11/20/15)
Participants:

 Description   

One failure of drain.js observed in Enterprise Windows, under the replicasets_WT_ese suite. The test is killed after running for two hours.

Task: https://evergreen.mongodb.com/task/mongodb_mongo_master_enterprise_windows_64_replicasets_WT_ese_466f05960f115b0bb5e565d88cedbed9d7b47328_15_10_02_13_13_40
Logs (complete logs for job, scroll to bottom): https://logkeeper.mongodb.org/build/560e9d6890413021b1b4bc6b/all



 Comments   
Comment by Mark Benvenuto [ 02/Nov/15 ]

I need another repro with the full logs from the hang. I believe it is a loader lock hang. Here is translated stack trace.

0:000> u MSVCR120!Concurrency::details::Log2+0x146
MSVCR120!Concurrency::details::DeleteAsyncTimerAndUnloadLibrary+0x42 [f:\dd\vctools\crt\crtw32\concrt\utils.cpp @ 707]:
0:000> u MSVCR120!Concurrency::details::GetSharedTimerQueue+0x1ba
MSVCR120!Concurrency::details::TimedSingleWaitBlock::destroyTimer+0x22 [f:\dd\vctools\crt\crtw32\concrt\event.cpp @ 497] [inlined in MSVCR120!Concurrency::details::TimedSingleWaitBlock::Satisfy+0x56 [f:\dd\vctools\crt\crtw32\concrt\event.cpp @ 497]]:

Comment by Benety Goh [ 15/Oct/15 ]

Seems the primary is stuck shutting down with the following stack trace:

    // Kill primary; secondary will enter drain mode to catch up
    primary.getDB("admin").shutdownServer({force:true});

 [2015/10/02 13:14:56.500] Child-SP          RetAddr           Call Site
 [2015/10/02 13:14:56.500] 00000000`09cdd518 00000000`775490a8 ntdll!ZwWaitForSingleObject+0xa
 [2015/10/02 13:14:56.500] 00000000`09cdd520 00000000`77548fa4 ntdll!RtlDeNormalizeProcessParams+0x5a8
 [2015/10/02 13:14:56.500] 00000000`09cdd5d0 00000000`7752231b ntdll!RtlDeNormalizeProcessParams+0x4a4
 [2015/10/02 13:14:56.500] 00000000`09cdd600 000007fe`fd5eb0f5 ntdll!LdrUnloadDll+0x2b
 [2015/10/02 13:14:56.500] 00000000`09cdd630 000007fe`f18ef382 KERNELBASE!FreeLibrary+0x25
 [2015/10/02 13:14:56.500] 00000000`09cdd660 000007fe`f18d4d96 MSVCR120!Concurrency::details::Log2+0x146
 [2015/10/02 13:14:56.500] 00000000`09cdd690 000007fe`f18d51e3 MSVCR120!Concurrency::details::GetSharedTimerQueue+0x1ba
 [2015/10/02 13:14:56.500] 00000000`09cdd6c0 000007fe`f1e6c989 MSVCR120!Concurrency::details::_Condition_variable::notify_one+0x53
 [2015/10/02 13:14:56.500] 00000000`09cdd730 00000001`3fda828a MSVCP120!Cnd_signal+0xd
 [2015/10/02 13:14:56.502] 00000000`09cdd760 00000001`3fc7e048 mongod!mongo::executor::NetworkInterfaceASIO::signalWorkAvailable+0x5a
 [2015/10/02 13:14:56.502] 00000000`09cdd7b0 00000001`3fc59710 mongod!mongo::repl::ReplicationExecutor::scheduleWork+0x58
 [2015/10/02 13:14:56.502] 00000000`09cdd7f0 00000001`3fc54896 mongod!<lambda_d0c8b516c3ceb462ea69ef312169de9f>::operator()+0x20
 [2015/10/02 13:14:56.502] 00000000`09cdd830 00000001`3fc5b23a mongod!std::_Callable_obj<<lambda_d0c8b516c3ceb462ea69ef312169de9f>,0>::_ApplyX<mongo::StatusWith<mongo::executor::TaskExecutor::CallbackHandle>,std::function<void __cdecl(mongo::executor::TaskExecutor::CallbackArgs const & __ptr64)> const & __ptr64>+0x16
 [2015/10/02 13:14:56.506] 00000000`09cdd870 00000001`3fb70767 mongod!std::_Func_impl<std::_Callable_obj<<lambda_d0c8b516c3ceb462ea69ef312169de9f>,0>,std::allocator<std::_Func_class<mongo::StatusWith<mongo::executor::TaskExecutor::CallbackHandle>,std::function<void __cdecl(mongo::executor::TaskExecutor::CallbackArgs const & __ptr64)> const & __ptr64> >,mongo::StatusWith<mongo::executor::TaskExecutor::CallbackHandle>,std::function<void __cdecl(mongo::executor::TaskExecutor::CallbackArgs const & __ptr64)> const & __ptr64>::_Do_call+0x1a
 [2015/10/02 13:14:56.506] 00000000`09cdd8b0 00000001`3fc63f0f mongod!std::_Func_class<boost::intrusive_ptr<mongo::Expression>,mongo::BSONElement,mongo::VariablesParseState const & __ptr64>::operator()+0x27
 [2015/10/02 13:14:56.506] 00000000`09cdd8f0 00000001`3fc600d2 mongod!mongo::repl::ReplicationCoordinatorImpl::_wrapAndScheduleWork+0xef
 [2015/10/02 13:14:56.506] 00000000`09cdd9e0 00000001`3fc66d13 mongod!mongo::repl::ReplicationCoordinatorImpl::_scheduleWork+0x42
 [2015/10/02 13:14:56.506] 00000000`09cdda50 00000001`3fce75ae mongod!mongo::repl::ReplicationCoordinatorImpl::interruptAll+0x163
 [2015/10/02 13:14:56.511] 00000000`09cddaf0 00000001`3fa991cb mongod!mongo::ServiceContextMongoD::setKillAllOperations+0x5e
 [2015/10/02 13:14:56.511] 00000000`09cddb30 00000001`3f9d4f34 mongod!mongo::exitCleanly+0xbb
 [2015/10/02 13:14:56.511] 00000000`09cddd20 00000001`3f9cffeb mongod!mongo::CmdShutdown::shutdownHelper+0x194
 [2015/10/02 13:14:56.511] 00000000`09cdde30 00000001`3f9d132a mongod!mongo::CmdShutdownMongoD::run+0x18b
 [2015/10/02 13:14:56.511] 00000000`09cddf70 00000001`3f9cc836 mongod!mongo::Command::run+0x69a
 [2015/10/02 13:14:56.511] 00000000`09cde3f0 00000001`3f94ad2a mongod!mongo::Command::execCommand+0x8c6
 [2015/10/02 13:14:56.511] 00000000`09cde830 00000001`3fa9e168 mongod!mongo::runCommands+0x25a
 [2015/10/02 13:14:56.511] 00000000`09cdeb00 00000001`3fa979de mongod!mongo::`anonymous namespace'::receivedRpc+0x1e8
 [2015/10/02 13:14:56.511] 00000000`09cdecd0 00000001`403ebb42 mongod!mongo::assembleResponse+0x7de
 [2015/10/02 13:14:56.511] 00000000`09cdf3c0 00000001`3feb77d1 mongod!mongo::MyMessageHandler::process+0xa2
 [2015/10/02 13:14:56.511] 00000000`09cdf550 00000001`3fe7199d mongod!mongo::PortMessageServer::handleIncomingMsg+0x491
 [2015/10/02 13:14:56.511] 00000000`09cdf920 000007fe`f1e6d24c mongod!std::_LaunchPad<std::_Bind<1,void * __ptr64,void * __ptr64 (__cdecl*const)(void * __ptr64),nspr::Thread * __ptr64> >::_Go+0x1d
 [2015/10/02 13:14:56.512] 00000000`09cdf950 000007fe`f18f4f7f MSVCP120!std::_Pad::_Release+0x6c
 [2015/10/02 13:14:56.513] 00000000`09cdf990 000007fe`f18f5126 MSVCR120!beginthreadex+0x107
 [2015/10/02 13:14:56.513] 00000000`09cdf9c0 00000000`772f59dd MSVCR120!endthreadex+0x192
 [2015/10/02 13:14:56.513] 00000000`09cdf9f0 00000000`7752a651 kernel32!BaseThreadInitThunk+0xd
 [2015/10/02 13:14:56.513] 00000000`09cdfa20 00000000`00000000 ntdll!RtlUserThreadStart+0x21
 [2015/10/02 13:14:56.513]   27  Id: 48c.cec Suspend: 1 Teb: 000007ff`fff82000 Unfrozen

Generated at Thu Feb 08 03:55:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.