[SERVER-19418] Mongod termination while executing concurrent, sharded aggregation pipelines Created: 15/Jul/15  Updated: 25/Jan/17  Resolved: 30/Sep/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.1.6
Fix Version/s: 3.1.9

Type: Bug Priority: Major - P3
Reporter: Jonathan Abrahams Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: 32qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Operating System: Windows
Sprint: Sharding A (10/09/15)
Participants:

 Description   

The aggregation pipeline caused a mongod termination (on Windows). This was discovered while running the concurrency sharded replication tests on Windows.



 Comments   
Comment by Githook User [ 30/Sep/15 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-19418 Work around MSVC2013 function static initializer bug in ShardedConnectionInfo::addHook.

MSVC2013 doesn't provide thread-safe initialization of function
statics,so this patch converts ShardedConnectionHook::addHook's function-static
mutex to a file-static mutex, which will be initialized before main().
Branch: master
https://github.com/mongodb/mongo/commit/257845475d4fb578a98e428260d889f0876f7e35

Comment by Mark Benvenuto [ 26/Aug/15 ]

VS 2013 does not support thread-safe local static initialization. VS 2015 supports thread-safe C++ 11 local static initialization.

This line will have erratic behavior given its complex constructor.

static stdx::mutex lock;

Comment by Max Hirschhorn [ 26/Aug/15 ]

IIRC, this was suspected to have been caused by an issue with static local variables on Windows, i.e. they are not initialized in a thread-safe manner according to the C++11 standard.

void ShardedConnectionInfo::addHook() {
    static stdx::mutex lock;
    static bool done = false;
 
    stdx::lock_guard<stdx::mutex> lk(lock);
    if (!done) {
        log() << "first cluster operation detected, adding sharding hook to enable versioning "
                 "and authentication to remote servers";
 
        globalConnPool.addHook(new ShardingConnectionHook(false));
        shardConnectionPool.addHook(new ShardingConnectionHook(true));
 
        done = true;
    }
}

https://github.com/mongodb/mongo/blob/r3.1.7/src/mongo/db/s/sharded_connection_info.cpp#L88-L89

CC mark.benvenuto to confirm.

Comment by Jonathan Abrahams [ 15/Jul/15 ]

The termination occurred during the jstests/concurrency/fsm_workloads/agg_base.js test:

[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.446+0000  m31101| 2015-07-14T22:20:56.446+0000 I CONTROL  [conn12] MSVCP120.dll                                                                                   Thrd_yield+0xb3
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.446+0000  m31101| 2015-07-14T22:20:56.446+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\s\sharded_connection_info.cpp(91)                               mongo::ShardedConnectionInfo::addHook+0x7a
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.463+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\pipeline\pipeline_d.cpp(123)                                    mongo::PipelineD::prepareCursorSource+0x320
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\commands\pipeline_command.cpp(239)                              mongo::PipelineCommand::run+0x40f
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\dbcommands.cpp(1307)                                            mongo::Command::run+0x3b0
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\dbcommands.cpp(1261)                                            mongo::Command::execCommand+0x93b
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\commands.cpp(16707566)                                          mongo::runCommands+0x257
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\instance.cpp(291)                                               mongo::`anonymous namespace'::receivedRpc+0x1e8
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\instance.cpp(507)                                               mongo::assembleResponse+0x7de
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\db\db.cpp(167)                                                     mongo::MyMessageHandler::process+0xa2
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    ...\src\mongo\util\net\message_server_port.cpp(231)                              mongo::PortMessageServer::handleIncomingMsg+0x487
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] mongod.exe    c:\program files (x86)\microsoft visual studio 12.0\vc\include\thr\xthread(187)  std::_LaunchPad<std::_Bind<0,void,boost::_bi::bind_t<void,void (__cdecl*)(mongo::FileAllocator * __ptr64),boost::_bi::list1<boost::_bi::value<mongo::FileAllocator * __ptr64> > > > >::_Go+0x2d
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.464+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] MSVCP120.dll                                                                                   std::_Pad::_Release+0x6c
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.466+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] MSVCR120.dll                                                                                   beginthreadex+0x107
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.466+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] MSVCR120.dll                                                                                   endthreadex+0x192
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.466+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] kernel32.dll                                                                                   BaseThreadInitThunk+0xd
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.466+0000  m31101| 2015-07-14T22:20:56.464+0000 I -        [conn12]
[js_test:fsm_all_sharded_replication] 2015-07-14T22:20:56.466+0000  m31101| 2015-07-14T22:20:56.464+0000 I CONTROL  [conn12] writing minidump diagnostic file C:\data\mci\src\mongod.2015-07-14T22-20-56.mdmp

Generated at Thu Feb 08 03:50:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.