[SERVER-8707] dbclient_rs_test threading issue Created: 25/Feb/13  Updated: 11/Jul/16  Resolved: 20/Aug/13

Status: Closed
Project: Core Server
Component/s: Concurrency, Internal Client
Affects Version/s: 2.4.6
Fix Version/s: 2.5.2

Type: Bug Priority: Critical - P2
Reporter: Eric Milkie Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-5773 mongo::PeriodicTask::Runner cleanup Closed
depends on SERVER-5520 Shutdown rewrite (tracking ticket) Closed
Duplicate
is duplicated by SERVER-10107 smokeCppUnittests:dbclient_rs_test fa... Closed
Related
related to SERVER-10372 ReplicaSetMonitor creates a thread th... Closed
Operating System: ALL
Participants:

 Description   

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/289/steps/test_2/logs/stdio

When I run the test with Valgrind, after a couple of runs it flags an issue at exit handler time when we try to destruct a global mongo::mutex (which has a boost::timed_mutex member).



 Comments   
Comment by Erich Siedler [ 21/Aug/13 ]

Randolph, I think the patch "Allow BackgroundJob to be restartable" have a race condition.

I've submitted this same feature a year ago in SERVER-6217 (BackgroundJobs cannot be run multiple times).

In my pull request you will find 2 test cases that clearly explain my point. They both fail with your patch.

Best regards.

Comment by auto [ 20/Aug/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-8707 dbclient_rs_test threading issue

Thoroughly cleanup all replica set connection related artifacts after every test.
Branch: master
https://github.com/mongodb/mongo/commit/07c221086ff5c20d4d62ba69837d90404459c335

Comment by auto [ 20/Aug/13 ]

Author:

{u'username': u'renctan', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}

Message: SERVER-8707 dbclient_rs_test threading issue

Allow BackgroundJob to be restartable.
Branch: master
https://github.com/mongodb/mongo/commit/712619f82fec8290eb1a1ecd1dd0185dfcb92a16

Comment by Matt Kangas [ 06/Aug/13 ]

V2.4 Linux 64-bit Subscription RHEL 6.2 Build #167 (Tue Aug 6)
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/167/test/core/dbclient_rs_test

test output ends with:

Tue Aug  6 02:21:31.652 ReplicaSetMonitor::_checkConnection: $test1:27017 { setName: "test", ismaster: false, secondary: true, hosts: [ "$test0:27017", "$test1:27017" ], primary: "$test0:27017", me: "$test1:27017", ok: true }
Tue Aug  6 02:21:31.652 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:31.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:32.653 warning: No primary detected for set test
Tue Aug  6 02:21:32.653 replica set monitor for replica set test started, address is test/$test0:27017,$test1:27017
Tue Aug  6 02:21:32.653 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:32.653 [ReplicaSetMonitorWatcher] deleting replica set monitor for: test/$test0:27017,$test1:27017
Tue Aug  6 02:21:32.653 deleting replica set monitor for: test/$test0:27017,$test1:27017
Tue Aug  6 02:21:32.653 	 going to run test: Nearest

Comment by Tad Marshall [ 06/Aug/13 ]

I ran the test in a loop in the debugger in Windows, testing a debug build. Here's the crash info:

First-chance exception at 0x0000000140174b10 in dbclient_rs_test.exe: 0xC0000005: Access violation reading location 0x0000000101ec4e68.
Unhandled exception at 0x0000000140174b10 in dbclient_rs_test.exe: 0xC0000005: Access violation reading location 0x0000000101ec4e68.

>Debug.ListCallStack
 Index  Function
--------------------------------------------------------------------------------
 1      dbclient_rs_test.exe!memcpy() 
*2      dbclient_rs_test.exe!std::char_traits<char>::copy(char * _First1=0x0000000140ee0070, const char * _First2=0x0000000001ec4e70, unsigned __int64 _Count=0x0000000100000000) 
 3      dbclient_rs_test.exe!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::assign(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right="°Mì", unsigned __int64 _Roff=0x0000000000000000, unsigned __int64 _Count=0xffffffffffffffff) 
 4      dbclient_rs_test.exe!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::basic_string<char,std::char_traits<char>,std::allocator<char> >(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & _Right="°Mì") 
 5      dbclient_rs_test.exe!mongo::MockRemoteDBServer::getServerAddress() 
 6      dbclient_rs_test.exe!mongo::MockDBClientConnection::getServerAddress() 
 7      dbclient_rs_test.exe!mongo::ReplicaSetMonitor::_checkConnMatch_inlock(mongo::DBClientConnection * conn=0x0000000001d8b010, unsigned __int64 nodeOffset=0x0000000000000001) 
 8      dbclient_rs_test.exe!mongo::ReplicaSetMonitor::_checkConnection(mongo::DBClientConnection * conn=0x0000000001d8b010, std::basic_string<char,std::char_traits<char>,std::allocator<char> > & maybePrimary="", bool verbose=true, int nodesOffset=0x00000001) 
 9      dbclient_rs_test.exe!mongo::ReplicaSetMonitor::_check() 
 10     dbclient_rs_test.exe!mongo::ReplicaSetMonitor::check() 
 11     dbclient_rs_test.exe!mongo::ReplicaSetMonitor::checkAll() 
 12     dbclient_rs_test.exe!mongo::ReplicaSetMonitorWatcher::run() 
 13     dbclient_rs_test.exe!mongo::BackgroundJob::jobBody(boost::shared_ptr<mongo::BackgroundJob::JobStatus> * status=0x000000000275fc40) 
 14     dbclient_rs_test.exe!boost::_mfi::mf1<void,mongo::BackgroundJob,boost::shared_ptr<mongo::BackgroundJob::JobStatus> >::operator()(mongo::BackgroundJob * p=0x0000000140d48d20, boost::shared_ptr<mongo::BackgroundJob::JobStatus> * a1=0x000000000275fca0) 
 15     dbclient_rs_test.exe!boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob * __ptr64>,boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > >::operator()<boost::_mfi::mf1<void,mongo::BackgroundJob,boost::shared_ptr<mongo::BackgroundJob::JobStatus> >,boost::_bi::list0>(boost::_bi::type<void> __formal={...}, boost::_mfi::mf1<void,mongo::BackgroundJob,boost::shared_ptr<mongo::BackgroundJob::JobStatus> > & f={...}, boost::_bi::list0 & a={...}, boost::_bi::type<void> __formal={...}) 
 16     dbclient_rs_test.exe!boost::_bi::bind_t<void,boost::_mfi::mf1<void,mongo::BackgroundJob,boost::shared_ptr<mongo::BackgroundJob::JobStatus> >,boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob * __ptr64>,boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >::operator()() 
 17     dbclient_rs_test.exe!boost::detail::thread_data<boost::_bi::bind_t<void,boost::_mfi::mf1<void,mongo::BackgroundJob,boost::shared_ptr<mongo::BackgroundJob::JobStatus> >,boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob * __ptr64>,boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >::run() 
 18     dbclient_rs_test.exe!boost::`anonymous namespace'::thread_start_function(void * param=0x00000000003d23c0) 
 19     dbclient_rs_test.exe!_callthreadstartex() 
 20     dbclient_rs_test.exe!_threadstartex(void * ptd=0x0000000001d85820) 
 21     kernel32.dll!BaseThreadInitThunk() 
 22     ntdll.dll!RtlUserThreadStart() 
 
>

>Debug.ListCallStack
 Index  Function
--------------------------------------------------------------------------------
 1      ntdll.dll!RtlpAllocateHeap() 
 2      ntdll.dll!RtlAllocateHeap() 
 3      ntdll.dll!RtlDebugAllocateHeap() 
 4      ntdll.dll!string "Enabling heap debug options\n"() 
 5      ntdll.dll!RtlAllocateHeap() 
*6      dbclient_rs_test.exe!_heap_alloc_base(unsigned __int64 size=0x000000000000003c) 
 7      dbclient_rs_test.exe!_heap_alloc_dbg_impl(unsigned __int64 nSize=0x0000000000000008, int nBlockUse=0x00000002, const char * szFileName=0x00000001404c2710, int nLine=0x00000234, int * errno_tmp=0x00000000001ecf10) 
 8      dbclient_rs_test.exe!_nh_malloc_dbg_impl(unsigned __int64 nSize=0x0000000000000008, int nhFlag=0x00000000, int nBlockUse=0x00000002, const char * szFileName=0x00000001404c2710, int nLine=0x00000234, int * errno_tmp=0x00000000001ecf10) 
 9      dbclient_rs_test.exe!_nh_malloc_dbg(unsigned __int64 nSize=0x0000000000000008, int nhFlag=0x00000000, int nBlockUse=0x00000002, const char * szFileName=0x00000001404c2710, int nLine=0x00000234) 
 10     dbclient_rs_test.exe!_malloc_dbg(unsigned __int64 nSize=0x0000000000000008, int nBlockUse=0x00000002, const char * szFileName=0x00000001404c2710, int nLine=0x00000234) 
 11     dbclient_rs_test.exe!operator new(unsigned __int64 sz=0x0000000000000008, const std::_DebugHeapTag_t & tag={...}, char * file=0x00000001404c2710, int line=0x00000234) 
 12     dbclient_rs_test.exe!std::ios_base::_Init() 
 13     dbclient_rs_test.exe!std::basic_ios<char,std::char_traits<char> >::init(std::basic_streambuf<char,std::char_traits<char> > * _Strbuf=0x00000000001ed178, bool _Isstd=false) 
 14     dbclient_rs_test.exe!std::basic_istream<char,std::char_traits<char> >::basic_istream<char,std::char_traits<char> >(std::basic_streambuf<char,std::char_traits<char> > * _Strbuf=0x00000000001ed178, bool _Isstd=false) 
 15     dbclient_rs_test.exe!std::basic_iostream<char,std::char_traits<char> >::basic_iostream<char,std::char_traits<char> >(std::basic_streambuf<char,std::char_traits<char> > * _Strbuf=0x00000000001ed178) 
 16     dbclient_rs_test.exe!std::basic_stringstream<char,std::char_traits<char>,std::allocator<char> >::basic_stringstream<char,std::char_traits<char>,std::allocator<char> >(int _Mode=0x00000003) 
 17     dbclient_rs_test.exe!mongo::ConnectionString::_finishInit() 
 18     dbclient_rs_test.exe!mongo::ConnectionString::ConnectionString(const mongo::HostAndPort & server={...}) 
 19     dbclient_rs_test.exe!mongo::_getConnWithRefresh(mongo::ReplicaSetMonitor::Node & node={...}) 
 20     dbclient_rs_test.exe!mongo::ReplicaSetMonitor::_check() 
 21     dbclient_rs_test.exe!mongo::ReplicaSetMonitor::_populateHosts_inSetsLock(const std::vector<mongo::HostAndPort,std::allocator<mongo::HostAndPort> > & seedList=[0x00000002]({_host="$test0" _port=0x00006989 },{_host="$test1" _port=0x00006989 })) 
 22     dbclient_rs_test.exe!mongo::ReplicaSetMonitor::ReplicaSetMonitor(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & name="test", const std::vector<mongo::HostAndPort,std::allocator<mongo::HostAndPort> > & servers=[0x00000002]({_host="$test0" _port=0x00006989 },{_host="$test1" _port=0x00006989 })) 
 23     dbclient_rs_test.exe!mongo::ReplicaSetMonitor::createIfNeeded(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & name="test", const std::vector<mongo::HostAndPort,std::allocator<mongo::HostAndPort> > & servers=[0x00000002]({_host="$test0" _port=0x00006989 },{_host="$test1" _port=0x00006989 })) 
 24     dbclient_rs_test.exe!mongo::DBClientReplicaSet::DBClientReplicaSet(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & name="test", const std::vector<mongo::HostAndPort,std::allocator<mongo::HostAndPort> > & servers=[0x00000002]({_host="$test0" _port=0x00006989 },{_host="$test1" _port=0x00006989 }), double so_timeout=0.00000000000000000) 
 25     dbclient_rs_test.exe!`anonymous namespace'::UnitTest__PrimaryDown__SecondaryOnly::_doTest() 
 26     dbclient_rs_test.exe!mongo::unittest::Test::run() 
 27     dbclient_rs_test.exe!mongo::unittest::Suite::runTestObject<`anonymous namespace'::UnitTest__PrimaryDown__SecondaryOnly>() 
 28     dbclient_rs_test.exe!boost::detail::function::void_function_invoker0<void (__cdecl*)(void),void>::invoke(boost::detail::function::function_buffer & function_ptr={...}) 
 29     dbclient_rs_test.exe!boost::function0<void>::operator()() 
 30     dbclient_rs_test.exe!mongo::unittest::TestHolder::run() 
 31     dbclient_rs_test.exe!mongo::unittest::Suite::run(const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & filter="", int runsPerTest=0x00000001) 
 32     dbclient_rs_test.exe!mongo::unittest::Suite::run(const std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > & suites=[0x00000000](), const std::basic_string<char,std::char_traits<char>,std::allocator<char> > & filter="", int runsPerTest=0x00000001) 
 33     dbclient_rs_test.exe!main(int argc=0x00000001, char * * argv=0x00000000003a33f0, char * * envp=0x00000000003a34c0) 
 34     dbclient_rs_test.exe!__tmainCRTStartup() 
 35     dbclient_rs_test.exe!mainCRTStartup() 
 36     kernel32.dll!BaseThreadInitThunk() 
 37     ntdll.dll!RtlUserThreadStart() 
 
>

Comment by Tad Marshall [ 06/Aug/13 ]

Builder Linux RHEL 64-bit Build #1468 – 85e79d7ad2db6fa0607957cac536d20a3738f6a5
http://buildbot.mongodb.org/builders/Linux%20RHEL%2064-bit/builds/1468
http://buildbot.mongodb.org/builders/Linux%20RHEL%2064-bit/builds/1468/steps/test_2/logs/stdio
http://buildlogs.mongodb.org/Linux%20RHEL%2064-bit/builds/1468/test/core/dbclient_rs_test

Tue Aug  6 06:55:27.336 going to run suite: TaggedFiveMemberRS
Tue Aug  6 06:55:27.336 	 going to run test: ConnShouldPinIfSameSettings

Same diagnosis as above.

Comment by Tad Marshall [ 06/Aug/13 ]

buildbot failure in 10gen Special MongoDB on V2.4 Linux 64-bit Subscription RHEL 6.2
http://buildbot-special.10gen.com/builders/V2.4%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/167/steps/test/logs/stdio
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/167/test/core/dbclient_rs_test

Tue Aug  6 02:21:22.649 going to run suite: PrimaryDown
// ... snip ...
Tue Aug  6 02:21:30.652 	 going to run test: SecondaryPreferred
Tue Aug  6 02:21:30.652 starting new replica set monitor for replica set test with seed of $test0:27017,$test1:27017
Tue Aug  6 02:21:30.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:30.652 error connecting to seed $test0:27017, err: cannot connect to $test0:27017
Tue Aug  6 02:21:30.652 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:30.652 successfully connected to seed $test1:27017 for replica set test
Tue Aug  6 02:21:30.652 changing hosts to { 0: "$test0:27017", 1: "$test1:27017" } from test/
Tue Aug  6 02:21:30.652 trying to add new host $test0:27017 to replica set test
Tue Aug  6 02:21:30.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:30.652 warning: cannot connect to new host $test0:27017 to replica set test, err: cannot connect to $test0:27017
Tue Aug  6 02:21:30.652 trying to add new host $test1:27017 to replica set test
Tue Aug  6 02:21:30.652 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:30.652 successfully connected to new host $test1:27017 in replica set test
Tue Aug  6 02:21:30.652 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:30.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:30.652 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:30.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:31.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:31.652 ReplicaSetMonitor::_checkConnection: $test1:27017 { setName: "test", ismaster: false, secondary: true, hosts: [ "$test0:27017", "$test1:27017" ], primary: "$test0:27017", me: "$test1:27017", ok: true }
Tue Aug  6 02:21:31.652 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:31.652 replacing connection to $test0:27017 with (empty)
Tue Aug  6 02:21:32.653 warning: No primary detected for set test
Tue Aug  6 02:21:32.653 replica set monitor for replica set test started, address is test/$test0:27017,$test1:27017
Tue Aug  6 02:21:32.653 replacing connection to $test1:27017 with $test1:27017
Tue Aug  6 02:21:32.653 [ReplicaSetMonitorWatcher] deleting replica set monitor for: test/$test0:27017,$test1:27017
Tue Aug  6 02:21:32.653 deleting replica set monitor for: test/$test0:27017,$test1:27017
Tue Aug  6 02:21:32.653 	 going to run test: Nearest

The test stops here ... probably a segfault or similar issue. We need to finish SERVER-10431 so we get a description of what happened and a stack trace.

Comment by auto [ 04/Aug/13 ]

Author:

{u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-8707 Sleep 10 seconds before entering ReplicaSetMonitorWatcher's loop

This restores the startup timing that was changed by commit
e3b8349ec305b00bdb7107f2d887549640a15c90, while preserving the
smaller time window within the loop between checking for
termination and calling ReplicaSetMonitor::checkAll().
Branch: v2.4
https://github.com/mongodb/mongo/commit/392b933598668768bf12b1e41ad444aa3548d970

Comment by Matt Kangas [ 04/Aug/13 ]

Again on v2.4, so it doesn't have Tad's latest commit.

V2.4 Linux 64-bit Subscription Ubuntu 1204 Build #152 Aug 4 rev ea62502ee

http://buildbot-special.10gen.com/builders/V2.4%20Linux%2064-bit%20Subscription%20Ubuntu%201204/builds/152/steps/test/logs/stdio
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20Subscription%20Ubuntu%201204/builds/152/test/core/dbclient_rs_test

Sun Aug  4 13:15:17.482 	 going to run test: PrimaryPreferred
Sun Aug  4 13:15:17.482 starting new replica set monitor for replica set test with seed of $test0:27017,$test1:27017
Sun Aug  4 13:15:17.482 replacing connection to $test0:27017 with $test0:27017
Sun Aug  4 13:15:17.482 successfully connected to seed $test0:27017 for replica set test
Sun Aug  4 13:15:17.482 changing hosts to { 0: "$test0:27017", 1: "$test1:27017" } from test/
Sun Aug  4 13:15:17.482 trying to add new host $test0:27017 to replica set test
Sun Aug  4 13:15:17.482 replacing connection to $test0:27017 with $test0:27017
Sun Aug  4 13:15:17.482 successfully connected to new host $test0:27017 in replica set test
Sun Aug  4 13:15:17.482 trying to add new host $test1:27017 to replica set test
Sun Aug  4 13:15:17.482 replacing connection to $test1:27017 with (empty)
Sun Aug  4 13:15:17.482 warning: cannot connect to new host $test1:27017 to replica set test, err: cannot connect to $test1:27017
Sun Aug  4 13:15:17.482 replacing connection to $test0:27017 with $test0:27017
Sun Aug  4 13:15:17.482 replacing connection to $test0:27017 with $test0:27017
Sun Aug  4 13:15:17.482 Primary for replica set test changed to $test0:27017
Sun Aug  4 13:15:17.482 replacing connection to $test0:27017 with $test0:27017
Sun Aug  4 13:15:17.483 replacing connection to $test1:27017 with (empty)
Sun Aug  4 13:15:17.483 replica set monitor for replica set test started, address is test/$test0:27017,$test1:27017
Sun Aug  4 13:15:17.483 replacing connection to $test0:27017 with $test0:27017

Comment by auto [ 04/Aug/13 ]

Author:

{u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-8707 Sleep 10 seconds before entering ReplicaSetMonitorWatcher's loop

This restores the startup timing that was changed by commit
e3b8349ec305b00bdb7107f2d887549640a15c90, while preserving the
smaller time window within the loop between checking for
termination and calling ReplicaSetMonitor::checkAll().
Branch: master
https://github.com/mongodb/mongo/commit/f141a861efcbefd72101544c8990fdff7ca27741

Comment by Matt Kangas [ 03/Aug/13 ]

More:

buildbot-special V2.4 Linux 64-bit SSL Ubuntu 1204 Build #128 rev 268d3f665
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20SSL%20Ubuntu%201204/builds/128/test/core/dbclient_rs_test

buildbot-special V2.4 Linux 64-bit Subscription Amazon AMI Build #153 rev ea62502e
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20Subscription%20Amazon%20AMI/builds/153/test/core/dbclient_rs_test

Sat Aug  3 06:41:23.604 	 going to run test: PrimaryPreferred
Sat Aug  3 06:41:23.604 [ReplicaSetMonitorWatcher]   Fatal Assertion 16534
0x9a8426 0x970d33 0x8429ea 0x842a76 0x6c9d7c 0x6dee55 0x6e8824 0x6eadd3 0x6eb803 0x6f62a6 0x9743c9 0x975f59 0x9e19cc 0x7f668259e7f1 0x7f668131bccd 
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo15printStackTraceERSo+0x26) [0x9a8426]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo13fassertFailedEi+0x63) [0x970d33]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo16MockConnRegistry7connectERKSs+0x15a) [0x8429ea]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo16MockConnRegistry12MockConnHook7connectERKNS_16ConnectionStringERSsd+0x36) [0x842a76]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZNK5mongo16ConnectionString7connectERSsd+0x18c) [0x6c9d7c]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo19_getConnWithRefreshERNS_17ReplicaSetMonitor4NodeE+0x95) [0x6dee55]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo17ReplicaSetMonitor6_checkEb+0x144) [0x6e8824]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo17ReplicaSetMonitor5checkEb+0x153) [0x6eadd3]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo17ReplicaSetMonitor8checkAllEb+0x193) [0x6eb803]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo24ReplicaSetMonitorWatcher3runEv+0x66) [0x6f62a6]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc9) [0x9743c9]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x79) [0x975f59]
 /data/buildslaves/Linux_64bit_Subscription_Amazon_AMI_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test() [0x9e19cc]
 /lib64/libpthread.so.0(+0x77f1) [0x7f668259e7f1]
 /lib64/libc.so.6(clone+0x6d) [0x7f668131bccd]
Sat Aug  3 06:41:23.609 [ReplicaSetMonitorWatcher] 

V2.4 Linux 64-bit Subscription RHEL 6.2 Build #163 rev ea62502ee
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20Subscription%20RHEL%206.2/builds/163/test/core/dbclient_rs_test

Sat Aug  3 02:26:50.381 	 going to run test: PrimaryPreferred
Sat Aug  3 02:26:50.381 [ReplicaSetMonitorWatcher]   Fatal Assertion 16534
0x9a83e6 0x970cf3 0x8429aa 0x842a36 0x6c9d3c 0x6dee15 0x6e87e4 0x6ead93 0x6eb7c3 0x6f6266 0x974389 0x975f19 0x9e198c 0x32f3a077f1 0x32f36e570d 
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo15printStackTraceERSo+0x26) [0x9a83e6]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo13fassertFailedEi+0x63) [0x970cf3]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo16MockConnRegistry7connectERKSs+0x15a) [0x8429aa]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo16MockConnRegistry12MockConnHook7connectERKNS_16ConnectionStringERSsd+0x36) [0x842a36]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZNK5mongo16ConnectionString7connectERSsd+0x18c) [0x6c9d3c]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo19_getConnWithRefreshERNS_17ReplicaSetMonitor4NodeE+0x95) [0x6dee15]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo17ReplicaSetMonitor6_checkEb+0x144) [0x6e87e4]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo17ReplicaSetMonitor5checkEb+0x153) [0x6ead93]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo17ReplicaSetMonitor8checkAllEb+0x193) [0x6eb7c3]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo24ReplicaSetMonitorWatcher3runEv+0x66) [0x6f6266]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc9) [0x974389]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x79) [0x975f19]
 /data/buildslaves/Linux_64bit_Subscription_RHEL_62_V2.4/mongo/build/linux2/release/ssl/mongo/dbclient_rs_test() [0x9e198c]
 /lib64/libpthread.so.0() [0x32f3a077f1]
 /lib64/libc.so.6(clone+0x6d) [0x32f36e570d]
Sat Aug  3 02:26:50.386 [ReplicaSetMonitorWatcher] 

Comment by Matt Kangas [ 02/Aug/13 ]

Ugh, another failure.

OS X 10.5 64-bit Build #5629 August 2 rev 07faf6e

http://buildbot.mongodb.org/builders/OS%20X%2010.5%2064-bit/builds/5629/steps/test_2/logs/stdio
http://buildlogs.mongodb.org/OS%20X%2010.5%2064-bit/builds/5629/test/core/dbclient_rs_test

Last lines in the test output are:

Fri Aug  2 18:47:40.649 warning: cannot connect to new host $test0:27017 to replica set test, err: cannot connect to $test0:27017
Fri Aug  2 18:47:40.649 trying to add new host $test1:27017 to replica set test
Fri Aug  2 18:47:40.649 replacing connection to $test1:27017 with $test1:27017
Fri Aug  2 18:47:40.649 successfully connected to new host $test1:27017 in replica set test
Fri Aug  2 18:47:40.649 replacing connection to $test0:27017 with (empty)
Fri Aug  2 18:47:41.649 [ReplicaSetMonitorWatcher] replacing connection to $test0:27017 with (empty)

Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-8707 Stop the ReplicaSetMonitorWatcher loop on process termination

Use StaticObserver to end ReplicaSetMonitorWatcher's loop when a program
using the C++ driver exits. Move the sleep to the bottom of the loop to
reduce the window between checking for termination and the call to
ReplicaSetMonitor::checkAll().
Branch: v2.4
https://github.com/mongodb/mongo/commit/1de7cd5518e2743a5e9e1db7727dccc488be43da

Comment by auto [ 02/Aug/13 ]

Author:

{u'username': u'tadmarshall', u'name': u'Tad Marshall', u'email': u'tad@10gen.com'}

Message: SERVER-8707 Stop the ReplicaSetMonitorWatcher loop on process termination

Use StaticObserver to end ReplicaSetMonitorWatcher's loop when a program
using the C++ driver exits. Move the sleep to the bottom of the loop to
reduce the window between checking for termination and the call to
ReplicaSetMonitor::checkAll().
Branch: master
https://github.com/mongodb/mongo/commit/e3b8349ec305b00bdb7107f2d887549640a15c90

Comment by Tad Marshall [ 31/Jul/13 ]

Failed on Linux RHEL 64-bit Build #1437 – 5246d29:
http://buildbot.mongodb.org/builders/Linux%20RHEL%2064-bit/builds/1437
http://buildbot.mongodb.org/builders/Linux%20RHEL%2064-bit/builds/1437/steps/test_2/logs/stdio
http://buildlogs.mongodb.org/Linux%20RHEL%2064-bit/builds/1437/test/core/dbclient_rs_test

Wed Jul 31 07:21:51.337 going to run suite: TaggedFiveMemberRS
Wed Jul 31 07:21:51.337 	 going to run test: ConnShouldPinIfSameSettings

The log ends at this point.

Comment by Tad Marshall [ 26/Jul/13 ]

Failure in same test; is this a different issue?
http://buildbot.mongodb.org/builders/Nightly%20Linux%2064-bit/builds/1516
http://buildbot.mongodb.org/builders/Nightly%20Linux%2064-bit/builds/1516/steps/test_4/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit/builds/1516/test/core/dbclient_rs_test

Fri Jul 26 11:52:07.047 	 DONE running tests
Fri Jul 26 11:52:07.047 **************************************************
Fri Jul 26 11:52:07.047 AllNodesDown         | tests:    5 | fails:    0 | assert calls:      5
Fri Jul 26 11:52:07.047 BasicRS              | tests:    4 | fails:    0 | assert calls:      4
Fri Jul 26 11:52:07.047 PrimaryDown          | tests:    5 | fails:    0 | assert calls:      5
Fri Jul 26 11:52:07.047 SecondaryDown        | tests:    5 | fails:    0 | assert calls:      5
Fri Jul 26 11:52:07.047 TaggedFiveMemberRS   | tests:    4 | fails:    0 | assert calls:      6
Fri Jul 26 11:52:07.047 TOTALS               | tests:   23 | fails:    0 | assert calls:     25
Fri Jul 26 11:52:07.047 [ReplicaSetMonitorWatcher] deleting replica set monitor for: test/$test0:27017,$test1:27017
dbclient_rs_test: src/third_party/boost/boost/thread/pthread/mutex.hpp:154: void boost::timed_mutex::lock(): Assertion `!pthread_mutex_lock(&m)' failed.

Comment by Matt Kangas [ 25/Jul/13 ]

tad - fyi, here is another C++ driver shutdown issue

Comment by Matt Kangas [ 14/Jul/13 ]

Failed in Nightly Linux 64-bit Subscription RHEL 5.7 Build #486 Sun July 14

http://buildbot-special.10gen.com/builders/Nightly%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/486/steps/test_2/logs/stdio
http://buildlogs.mongodb.org/Nightly%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/486/test/core/dbclient_rs_test

Sun Jul 14 02:41:35.541 going to run suite: TaggedFiveMemberRS
Sun Jul 14 02:41:35.541 	 going to run test: ConnShouldPinIfSameSettings
<EOF>

Comment by Matt Kangas [ 12/Jul/13 ]

Ditto: V2.4 Linux 64-bit Subscription RHEL 5.7 Build #137

http://buildbot-special.10gen.com/builders/V2.4%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/137/steps/test/logs/stdio
http://buildlogs.mongodb.org/V2.4%20Linux%2064-bit%20Subscription%20RHEL%205.7/builds/137/test/core/dbclient_rs_test

Last lines in buildlog are

Fri Jul 12 07:29:04.025 going to run suite: TaggedFiveMemberRS
Fri Jul 12 07:29:04.025 	 going to run test: ConnShouldPinIfSameSettings

Comment by Eric Milkie [ 25/Feb/13 ]

I don't see an easy way to solve this – we can add more checks for !inShutdown() but that won't solve the threading problem.
This will continue to fail until we have a way to synchronize threads at shutdown.

Comment by Eric Milkie [ 25/Feb/13 ]

Mon Feb 25 10:14:11.484 [ReplicaSetMonitorWatcher] starting
dbclient_rs_test: src/third_party/boost/boost/thread/pthread/mutex.hpp:135: boost::timed_mutex::~timed_mutex(): Assertion `!pthread_mutex_destroy(&m)' failed.
Mon Feb 25 10:14:11.484 [ReplicaSetMonitorWatcher] starting

Looks like RS Monitor Watcher thread was doing things (and using globals) during global destructor time.

Generated at Thu Feb 08 03:18:10 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.