[SERVER-19309] DBtest fails ThreadedTests::Slack<mongo::SimpleRWLock, mongo::SimpleRWLock::Exclusive> Created: 07/Jul/15  Updated: 25/Aug/15  Resolved: 25/Aug/15

Status: Closed
Project: Core Server
Component/s: Concurrency, Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marek Skalický Assignee: Sam Kleinman (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

It is build by:

scons install \
        %{?_smp_mflags} \
        --use-system-all \
        --variant-dir=build%{?dist} \
        --nostrip \
        --ssl \
        --disable-warnings-as-errors \
        --prefix=%{buildroot}%{_prefix} \
        --wiredtiger=off \
        CCFLAGS="%{?optflags}" LINKFLAGS="%{?__global_ldflags}"

Where %{?optflags} = "-O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -m64 -mtune=generic" and %{?__global_ldflags} = "-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld".

Because it is not failing in all cases, it is necessary to run dbtest more times...

Script to reproduce:

ret=0
i=0
while [ $ret -eq 0 -a $i -le 100 ]; do
	./dbtest threading
	ret=$?
	i=$(($i+1))
	echo "Program $i ended with error code $ret"
done

Participants:

 Description   

When I run "./dbtest --dbpath ./var/dbtest" it fails with

"dbtest: /usr/include/boost/thread/pthread/condition_variable_fwd.hpp:86: boost::condition_variable::~condition_variable(): Assertion `!ret' failed."

2015-06-09T14:22:54.176+0000 I STORAGE  [testsuite] CMD fsync: sync:1 lock:0
2015-06-09T14:22:54.181+0000 I NETWORK  [testsuite] replacing connection to $hostFooBar:27017 with localhost
2015-06-09T14:22:54.183+0000 I SHARDING [testsuite] ChunkManager: time to load chunks for foo.bar: 1ms sequenceNumber: 24 version: 1|92||5576f6bdcbe8e366972d28f5 based on: (empty)
2015-06-09T14:22:54.183+0000 I NETWORK  [testsuite] replacing connection to $hostFooBar:27017 with localhost
2015-06-09T14:22:54.183+0000 I SHARDING [testsuite] ChunkManager: time to load chunks for foo.bar: 0ms sequenceNumber: 25 version: 2|1||5576f6bdcbe8e366972d28f5 based on: 1|92||5576f6bdcbe8e366972d28f5
2015-06-09T14:22:54.184+0000 I -        [testsuite] 	 going to run test: ShardingTests::ChunkDiffUnitTestNormal
2015-06-09T14:23:04.020+0000 I -        [testsuite] 	 going to run test: ShardingTests::ChunkDiffUnitTestInverse
2015-06-09T14:23:15.637+0000 I -        [testsuite] 	 DONE running tests
2015-06-09T14:23:15.637+0000 I -        [testsuite] going to run suite: sock
2015-06-09T14:23:15.637+0000 I -        [testsuite] 	 going to run test: SockTests::HostByName
2015-06-09T14:23:16.200+0000 W NETWORK  [testsuite] getaddrinfo("asdfasdfasdf_no_such_host") failed: No address associated with hostname
2015-06-09T14:23:16.201+0000 I -        [testsuite] 	 DONE running tests
2015-06-09T14:23:16.201+0000 I -        [testsuite] going to run suite: threading
2015-06-09T14:23:16.201+0000 I -        [testsuite] 	 going to run test: ThreadedTests::WriteLocksAreGreedy
2015-06-09T14:23:16.202+0000 I COMMAND  [utest] 6202 1
2015-06-09T14:23:16.302+0000 I COMMAND  [utest] 6302 2
2015-06-09T14:23:16.402+0000 I COMMAND  [utest] 6402 3
2015-06-09T14:23:16.602+0000 I COMMAND  [utest] 6602 1x
2015-06-09T14:23:16.602+0000 I COMMAND  [utest] 6602 2x
2015-06-09T14:23:16.602+0000 I COMMAND  [utest] 6602 3x
2015-06-09T14:23:16.602+0000 I COMMAND  [utest] 200
2015-06-09T14:23:16.602+0000 I -        [testsuite] 	 going to run test: ThreadedTests::Slack<mongo::mutex, mongo::mutex::scoped_lock>
Boost version : 105700
N5mongo5mutexE Slack useful work fraction: 0.845048 locks:125431
2015-06-09T14:23:18.104+0000 I -        [testsuite] 	 going to run test: ThreadedTests::Slack<mongo::SimpleMutex, mongo::SimpleMutex::scoped_lock>
N5mongo11SimpleMutexE Slack useful work fraction: 0.83404 locks:123297
2015-06-09T14:23:19.606+0000 I -        [testsuite] 	 going to run test: ThreadedTests::Slack<mongo::SimpleRWLock, mongo::SimpleRWLock::Exclusive>
N5mongo12SimpleRWLockE Slack useful work fraction: 0.726603 locks:105394
dbtest: /usr/include/boost/thread/pthread/condition_variable_fwd.hpp:86: boost::condition_variable::~condition_variable(): Assertion `!ret' failed.
2015-06-09T14:23:21.117+0000 F -        [testsuite] Got signal: 6 (Aborted).
 0xf72fc1a9 0xf72fba21 0xf72fbedb 0xf5fd4be0 0xf5fd4c10 0xf520dcd6 0xf520f567 0xf52066c7 0xf520674b 0xf66b9b17 0xf6875026 0xf641c2e9 0xf72548b8 0xf7257321 0xf7054d08 0xf648a25d 0xf63df13a 0xf51f7925 0xf641ba6b
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"F5FFB000","o":"13011A9"},{"b":"F5FFB000","o":"1300A21"},{"b":"F5FFB000","o":"1300EDB"},{"b":"F5FD4000","o":"BE0"},{"b":"F5FD4000","o":"C10"},{"b":"F51DF000","o":"2ECD6"},{"b":"F51DF000","o":"30567"},{"b":"F51DF000","o":"276C7"},{"b":"F51DF000","o":"2774B"},{"b":"F5FFB000","o":"6BEB17"},{"b":"F5FFB000","o":"87A026"},{"b":"F5FFB000","o":"4212E9"},{"b":"F5FFB000","o":"12598B8"},{"b":"F5FFB000","o":"125C321"},{"b":"F5FFB000","o":"1059D08"},{"b":"F5FFB000","o":"48F25D"},{"b":"F5FFB000","o":"3E413A"},{"b":"F51DF000","o":"18925"},{"b":"F5FFB000","o":"420A6B"}],"processInfo":{ "mongodbVersion" : "3.0.3", "gitVersion" : "nogitversion", "uname" : { "sysname" : "Linux", "release" : "3.19.5-200.fc21.x86_64", "version" : "#1 SMP Mon Apr 20 19:51:56 UTC 2015", "machine" : "i686" }, "somap" : [ { "b" : "F5FFB000", "elfType" : 3, "buildId" : "932FBCAE7965F7AEF51E8515791D7E396371AEE3" }, { "b" : "F5FD4000", "path" : "linux-gate.so.1", "elfType" : 3, "buildId" : "CFE4EF1700503FD86C66A3B5D0354E569C965202" }, { "b" : "F5F8C000", "path" : "/lib/libstemmer.so.0", "elfType" : 3, "buildId" : "3BFA005A37EEF8E38EFF29FEBD2AA7E59077B55A" }, { "b" : "F5F85000", "path" : "/lib/libsnappy.so.1", "elfType" : 3, "buildId" : "E24EE9C67C5043538F067ACD5668F294BAD7A3E6" }, { "b" : "F5F13000", "path" : "/lib/libpcre.so.1", "elfType" : 3, "buildId" : "C476DFFD40599C9FEFEB173A067A2A564437BB2B" }, { "b" : "F5F08000", "path" : "/lib/libpcrecpp.so.0", "elfType" : 3, "buildId" : "FDE36F1AD7ACAF669F58BDD8992449D567D6B71E" }, { "b" : "F5E99000", "path" : "/lib/libboost_program_options.so.1.57.0", "elfType" : 3, "buildId" : "EFBCCE1DB7A34A855E5CDBCDE3FFE71A2E798FE5" }, { "b" : "F5E83000", "path" : "/lib/libboost_filesystem.so.1.57.0", "elfType" : 3, "buildId" : "8688B5714D25DB33212EDD6B45FD9B58ACFF8963" }, { "b" : "F5E5E000", "path" : "/lib/libboost_thread.so.1.57.0", "elfType" : 3, "buildId" : "72940075269BF89764D84AC1CFB7224FB07D38F6" }, { "b" : "F5E59000", "path" : "/lib/libboost_system.so.1.57.0", "elfType" : 3, "buildId" : "9DC72F96CD87E40E62E59F462A933E17C657C240" }, { "b" : "F5DF2000", "path" : "/lib/libtcmalloc.so.4", "elfType" : 3, "buildId" : "EF452B401365E0CA4EB830EB84ACD45356084319" }, { "b" : "F58AB000", "path" : "/lib/libv8.so.3", "elfType" : 3, "buildId" : "39E3336133B7A3ACA8BE1294954BAEE0E566441B" }, { "b" : "F5829000", "path" : "/lib/libyaml-cpp.so.0.5", "elfType" : 3, "buildId" : "24CB4550A6796BADCFB18336D5489073DE22763A" }, { "b" : "F57B6000", "path" : "/lib/libssl.so.10", "elfType" : 3, "buildId" : "6562F5E4E4BC47DFAD9AD9E6BA93BAA277FF4351" }, { "b" : "F55C8000", "path" : "/lib/libcrypto.so.10", "elfType" : 3, "buildId" : "EBC1D7DCE3F241BA28732EEC92F1D9DE7561E4E3" }, { "b" : "F55BF000", "path" : "/lib/librt.so.1", "elfType" : 3, "buildId" : "90FF057B09FD4C9C39AD56F82D914A97565EDD3A" }, { "b" : "F55BA000", "path" : "/lib/libdl.so.2", "elfType" : 3, "buildId" : "EC8287BAB9B738E6A2C9100563C7EF7A65A4D8A0" }, { "b" : "F543F000", "path" : "/lib/libstdc++.so.6", "elfType" : 3, "buildId" : "6312822B99518F61DC3C97703C6B9144AEB72670" }, { "b" : "F53F1000", "path" : "/lib/libm.so.6", "elfType" : 3, "buildId" : "369850833603EE6844E9D18537EBED321B3EF6D6" }, { "b" : "F53D3000", "path" : "/lib/libgcc_s.so.1", "elfType" : 3, "buildId" : "7A0239F9F4C535715C5D0A06D0D6E1BD3949C535" }, { "b" : "F53B6000", "path" : "/lib/libpthread.so.0", "elfType" : 3, "buildId" : "682912D2E1D0E7DBAB002EFFE63DDD755130ECEA" }, { "b" : "F51DF000", "path" : "/lib/libc.so.6", "elfType" : 3, "buildId" : "E80463C24626C2FD20830E386FF758562468D739" }, { "b" : "F5FD5000", "path" : "/lib/ld-linux.so.2", "elfType" : 3, "buildId" : "B6282FD64643B1B3ECF5D5033588E3BA5AE2B0E0" }, { "b" : "F51C9000", "path" : "/lib/libunwind.so.8", "elfType" : 3, "buildId" : "19F22D6C9CC1EA966D664D11DE79F13D52209A5F" }, { "b" : "F4F64000", "path" : "/lib/libicui18n.so.54", "elfType" : 3, "buildId" : "260EA5D4FCC662C8F8ACBC05FD3243BA35C39788" }, { "b" : "F4DD3000", "path" : "/lib/libicuuc.so.54", "elfType" : 3, "buildId" : "471457014879AA7B03E97906E0D4C8A1DF487D75" }, { "b" : "F35A7000", "path" : "/lib/libicudata.so.54", "elfType" : 3, "buildId" : "A420902092223A1853BD1BB2B79659FD0371308D" }, { "b" : "F355A000", "path" : "/lib/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "8E3C811ACF248F91C4F653827AF1A692EDEFC2F2" }, { "b" : "F3483000", "path" : "/lib/libkrb5.so.3", "elfType" : 3, "buildId" : "2631054252FEFC7B3D35B212BDA779849F959B35" }, { "b" : "F347E000", "path" : "/lib/libcom_err.so.2", "elfType" : 3, "buildId" : "E9BDE6F6CF577212CD60C605A31D79D925CCA8A8" }, { "b" : "F344B000", "path" : "/lib/libk5crypto.so.3", "elfType" : 3, "buildId" : "1674B297F79DB6A20479AEBF8952359898B3BBFB" }, { "b" : "F342E000", "path" : "/lib/libresolv.so.2", "elfType" : 3, "buildId" : "5AB9E6B9A7473B060A1A0AB60DDF8F8F7DA729FB" }, { "b" : "F3416000", "path" : "/lib/libz.so.1", "elfType" : 3, "buildId" : "61BD9738D3A26D072C40A50439A08A4B7DE9FCB1" }, { "b" : "F3406000", "path" : "/lib/libkrb5support.so.0", "elfType" : 3, "buildId" : "D43AE6AEE2DB2EDD10A957D478CC3A0DC952D5A6" }, { "b" : "F3401000", "path" : "/lib/libkeyutils.so.1", "elfType" : 3, "buildId" : "D72A0BB73DDF9B5A7B661A457A99FFED4C6E83D2" }, { "b" : "F33DC000", "path" : "/lib/libselinux.so.1", "elfType" : 3, "buildId" : "F24BBD1B967EAA1A7DC29EA0FEB0D8DC1DEC5439" } ] }}
 dbtest(_ZN5mongo15printStackTraceERSo+0x39) [0xf72fc1a9]
 dbtest(+0x1300A21) [0xf72fba21]
 dbtest(+0x1300EDB) [0xf72fbedb]
 linux-gate.so.1(__kernel_sigreturn+0x0) [0xf5fd4be0]
 linux-gate.so.1(__kernel_vsyscall+0x10) [0xf5fd4c10]
 libc.so.6(gsignal+0x46) [0xf520dcd6]
 libc.so.6(abort+0x157) [0xf520f567]
 libc.so.6(+0x276C7) [0xf52066c7]
 libc.so.6(+0x2774B) [0xf520674b]
 dbtest(_ZN5boost18condition_variableD1Ev+0x67) [0xf66b9b17]
 dbtest(_ZN5mongo8unittest5Suite13runTestObjectIN13ThreadedTests5SlackINS_12SimpleRWLockENS5_9ExclusiveEEEEEvv+0xE6) [0xf6875026]
 dbtest(_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE+0x9) [0xf641c2e9]
 dbtest(_ZN5mongo8unittest5Suite3runERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEi+0x848) [0xf72548b8]
 dbtest(_ZN5mongo8unittest5Suite3runERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS8_EERKS8_i+0x861) [0xf7257321]
 dbtest(_ZN5mongo7dbtests10runDbTestsEiPPc+0xB8) [0xf7054d08]
 dbtest(_Z11dbtestsMainiPPcS0_+0x18D) [0xf648a25d]
 dbtest(main+0x2A) [0xf63df13a]
 libc.so.6(__libc_start_main+0xF5) [0xf51f7925]
 dbtest(+0x420A6B) [0xf641ba6b]
-----  END BACKTRACE  -----

See this bug - https://jira.mongodb.org/browse/SERVER-18906

I have a wrong assumption that this is caused by --dbpath handling.

It could be some race condition problem. It is not failing in every run. All tests in threading suite pass fine, only in some cases it fails.



 Comments   
Comment by Sam Kleinman (Inactive) [ 09/Jul/15 ]

Thanks for this update. Can you also provide more information about the C++ standard library and version?

At the moment I'm most interested in reproducing recent master versions built with using all of the bundled (vendored third_party libraries.) Once we can isolate and reproduce the issue there we can move to fix the issue. If it's only reproducible with the more recent boost we can test with the --use-system-boost to try and isolate that variable.

The dbTest suites run as part of the compile step, which is the first step (box) in each column in the evergreen display, and you can use that display to find commits that have passed on every other platform to test on Fedora Rawhide.

I will leave this issue open while we wait for you to get back online. Take care!

Cheers,
sam

Comment by Marek Skalický [ 09/Jul/15 ]

boost - 1.57
glibc - 2.21.90
-> list of all versions for example there: https://kojipkgs.fedoraproject.org//work/tasks/7498/10327498/root.log
This is updated actual Fedora rawhide. I tested this only on x86_64. If you would like to test it, there are some builds:
https://koji.fedoraproject.org/koji/taskinfo?taskID=10323018 - version 3.0.4, for building details see build.log of your architecture. You have to download and install all mongodb, mongodb-server and mongodb-test. Dbtest binary is stored in /usr/share/mongodb-test.
https://koji.fedoraproject.org/koji/taskinfo?taskID=10327398 - version 3.1.5 with bundled libraries.
https://koji.fedoraproject.org/koji/taskinfo?taskID=10327497 - version 3.1.5 with system libraries
https://koji.fedoraproject.org/koji/taskinfo?taskID=10327409 - version 3.1.6-pre (your commit) with bundled libs (it is not possible to build with system libraries in fedora - old version of asio lib)

-> I tried to reproduce. And I was successful with 3.1.5 and also 3.1.6 (for example for bundled libs I have to run dbtest 200 times ~ the script 2 times)

Also some tests failed with

N5mongo12SimpleRWLockE Slack useful work fraction: 0.878031 locks:115871
2015-07-09T11:11:47.950+0200 I -        [testsuite] 	 going to run test: ThreadedTests::CondSlack
CondSlack useful work fraction: 0.853867 locks:115678
dbtest: /usr/include/boost/thread/pthread/condition_variable.hpp:169: boost::condition_variable_any::~condition_variable_any(): Assertion `!pthread_cond_destroy(&cond)' failed.
2015-07-09T11:11:49.458+0200 F -        [testsuite] Got signal: 6 (Aborted).

I will try to provide more test details.
Unfortunately I am going to be offline for 14 day. So I won't be able to give more details about reproducer
After I get back, I plan to continue in testing and giving more details...

Thanks,
Marek

Comment by Sam Kleinman (Inactive) [ 08/Jul/15 ]

Thanks for getting back to me.

I'm particularly interested in the versions of the packages that your build is using rather than using the vendorized packages, which is probably just Boost in this case. The version of libc that you're using may also be useful just as a data point.

I want to try and isolate the cause of this error a bit more, and my own attempts to reproduce the problem using your script have not yet been successful. Could you attempt two different tests to see if you can reproduce the error:

  1. Could you attempt to build and run the test against a very recent version of master this commit has recently passed all unit tests across all platforms we test on, so I would expect it to work on your platform. There've been some changes to theading and concurrency primitives after 3.1.5 (SERVER-19099) that might be useful to disambiguate.
  2. Could you attempt to build and run the test without --use-system-all, deferring to our vendorized libraries just to double check.

Cheers,
sam

Comment by Marek Skalický [ 08/Jul/15 ]

Thank you.

1. Sorry, I was wrong. It was built with wiredtiger=on... I will try to build it with wiredtiger=off to check if it is failing for me.
2. I am using Fedora Rawhide. Version of which packages?
3. First I saw this failure in Fedora build system - Koji. In Koji this test is failing almost everytime (it is really random that it pass). And Koji it also failed with Fedora 22. For a long time a was not able to reproduce it in my virtual machine, so I thought it is a problem of Koji. But when I tried to run dbtest by the script above, sometime it simply failed...
4.

$ gcc --version
gcc (GCC) 5.1.1 20150618 (Red Hat 5.1.1-4)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
$ ld --version
GNU ld version 2.25-11.fc23
Copyright (C) 2014 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.


5. I tried to run dbtest with MongoDB 3.0.4, also 3.1.5 failed with this error. I havent tried earlier versions.

Comment by Sam Kleinman (Inactive) [ 07/Jul/15 ]

I'm attempting to reproduce this right now using the latest master, and I will keep you posted with my progress.

I have a couple of additional questions and clarifications while I work on the reproduction.

  1. I was unable to build this using wiredtiger=off
  2. What operating system and versions of the system packages are you using?
  3. How often do you see this failure?
  4. Can you provide version information about the compiler and linker that you used to build the test?
  5. Which version or versions of MongoDB did you see this error on?

Regards,
sam

Generated at Thu Feb 08 03:50:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.