[SERVER-3620] test (dbtests) fails with pthread assertion Created: 17/Aug/11  Updated: 11/Jul/16  Resolved: 22/May/12

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 2.1.2

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Here is the stack trace, at the end of the test run as the program is exiting

Seems like the problem is either locking the mongo::mutex in NotifyAll or something happening inside the boost::condition in NotifyAll. Haven't looked closely.

0xd70133 0xb348a3 0xb34ebd 0xb351a5 0xb35755 0xe50210 0x34dee06407 0x34de2d4b0d
/home/yellow/buildbot/Linux_64bit_v8/mongo/test(_ZN5mongo9NotifyAll9notifyAllEy+0x83) [0xd70133]
/home/yellow/buildbot/Linux_64bit_v8/mongo/test(_ZN5mongo3dur28_groupCommitWithLimitedLocksEv+0x343) [0xb348a3]
/home/yellow/buildbot/Linux_64bit_v8/mongo/test(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x1d) [0xb34ebd]
/home/yellow/buildbot/Linux_64bit_v8/mongo/test [0xb351a5]
/home/yellow/buildbot/Linux_64bit_v8/mongo/test(_ZN5mongo3dur9durThreadEv+0x85) [0xb35755]
/home/yellow/buildbot/Linux_64bit_v8/mongo/test(thread_proxy+0x80) [0xe50210]
/lib64/libpthread.so.0 [0x34dee06407]
/lib64/libc.so.6(clone+0x6d) [0x34de2d4b0d]

<http://buildbot.mongodb.org:8081/builders/Linux%2064-bit%20v8/builds/2513/steps/test/logs/stdio>

@eliot let me know if you want me to work on a fix



 Comments   
Comment by Andy Schwerin [ 22/May/12 ]

Believe fixed by the combination of the following git commits:

http://github.com/mongodb/mongo/commit/345151f79d48a634526c7114f77a30405437e200
http://github.com/mongodb/mongo/commit/1647933b0f953badcdecd6b3941db7aeca2e7aba

Comment by Dwight Merriman [ 20/Aug/11 ]

this may be a race condition during shutdown and dbexit. i'm not sure though. _groupCommitWithLimitedLocks releases dbMutex but is in other locks while finishing. it could be that

Aug 17 02:49:04 [testsuite] journalCleanup...

does not have the right lock to keep something with this from racing. either that or something with shutdown order of global destructors once ::exit is called.

also it looks like in this log on an assertion the shutdown sequence (e.g. "shutdown: going to close listening sockets..." etc.) repeats. That is wrong and should be fixed too before this ticket is closed.

this is probably too big a change to fix in 2.0 at this point.

Comment by Dwight Merriman [ 20/Aug/11 ]

looks like this is an issue with shutdown and maybe shutdown order. will look a bit more.

Comment by Dwight Merriman [ 18/Aug/11 ]

ok so asserting here:

class pthread_mutex_scoped_lock
{
pthread_mutex_t* m;
bool locked;
public:
explicit pthread_mutex_scoped_lock(pthread_mutex_t* m_):
m(m_),locked(true)

{ BOOST_VERIFY(!pthread_mutex_lock(m)); <------------------------ }
Comment by Dwight Merriman [ 18/Aug/11 ]

what's the assert? was that logged?

Comment by Dwight Merriman [ 18/Aug/11 ]

how to reproduce?

what boost version?

tx

Generated at Thu Feb 08 03:03:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.