[SERVER-16688] DurableImpl::commitAndStopDurThread must wait for the durability thread to exit Created: 22/Dec/14  Updated: 29/Jul/16  Resolved: 16/Jan/15

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 3.0.0-rc6

Type: Bug Priority: Major - P3
Reporter: Ian Whalen (Inactive) Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

m31000| 2014-12-21T17:08:52.309+0000 I -        [journal] Assertion failure false src/mongo/db/storage/mmap_v1/dur_recover.cpp 275
 m31000| 2014-12-21T17:08:52.314+0000 I CONTROL  [journal] 
 m31000|  0xf0e309 0xeb8121 0xe9e30e 0xce88dd 0xce896c 0xce8dbb 0xce902c 0xce9b87 0xcee500 0xcde1ae 0xf5a614 0x2afdeed1d83d 0x2afdefba7fcd
 m31000| ----- BEGIN BACKTRACE -----
 m31000| {"backtrace":[{"b":"400000","o":"B0E309"},{"b":"400000","o":"AB8121"},{"b":"400000","o":"A9E30E"},{"b":"400000","o":"8E88DD"},{"b":"400000","o":"8E896C"},{"b":"400000","o":"8E8DBB"},{"b":"400000","o":"8E902C"},{"b":"400000","o":"8E9B87"},{"b":"400000","o":"8EE500"},{"b":"400000","o":"8DE1AE"},{"b":"400000","o":"B5A614"},{"b":"2AFDEED17000","o":"683D"},{"b":"2AFDEFAD3000","o":"D4FCD"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc4-pre-", "gitVersion" : "0ae940b456ceafc3a0d473714c390f95d3a04823", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2AFDEED17000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2AFDEEF33000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2AFDEF13C000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2AFDEF341000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2AFDEF641000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2AFDEF8C4000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2AFDEFAD3000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2AFDEEAF9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 m31000|  mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf0e309]
 m31000|  mongod(_ZN5mongo10logContextEPKc+0xE1) [0xeb8121]
 m31000|  mongod(_ZN5mongo12verifyFailedEPKcS1_j+0xCE) [0xe9e30e]
 m31000|  mongod(_ZN5mongo3dur11RecoveryJob4Last8newEntryERKNS0_18ParsedJournalEntryERS1_+0x2AD) [0xce88dd]
 m31000|  mongod(_ZN5mongo3dur11RecoveryJob5writeERNS1_4LastERKNS0_18ParsedJournalEntryE+0x4C) [0xce896c]
 m31000|  mongod(_ZN5mongo3dur11RecoveryJob10applyEntryERNS1_4LastERKNS0_18ParsedJournalEntryEbb+0x33B) [0xce8dbb]
 m31000|  mongod(_ZN5mongo3dur11RecoveryJob12applyEntriesERKSt6vectorINS0_18ParsedJournalEntryESaIS3_EE+0xCC) [0xce902c]
 m31000|  mongod(_ZN5mongo3dur11RecoveryJob14processSectionEPKNS0_11JSectHeaderEPKvjPKNS0_11JSectFooterE+0x607) [0xce9b87]
 m31000|  mongod(_ZN5mongo3dur16WRITETODATAFILESERKNS0_11JSectHeaderERNS_14AlignedBuilderE+0x100) [0xcee500]
 m31000|  mongod(+0x8DE1AE) [0xcde1ae]
 m31000|  mongod(+0xB5A614) [0xf5a614]
 m31000|  libpthread.so.0(+0x683D) [0x2afdeed1d83d]
 m31000|  libc.so.6(clone+0x6D) [0x2afdefba7fcd]
 m31000| -----  END BACKTRACE  -----
 m31000| 2014-12-21T17:08:52.315+0000 I JOURNAL  [journal] dbexception in groupCommit causing immediate shutdown: 0 assertion src/mongo/db/storage/mmap_v1/dur_recover.cpp:275
 m31000| 2014-12-21T17:08:52.315+0000 F COMMAND  [journal] gc1
 m31000| 2014-12-21T17:08:52.318+0000 F -        [journal] Got signal: 6 (Aborted).
 m31000| 
 m31000|  0xf0e309 0xf0d9b2 0xf0dd36 0x2afdefb03030 0x2afdefb02fc5 0x2afdefb04a70 0xa86a6c 0xcde881 0xf5a614 0x2afdeed1d83d 0x2afdefba7fcd
 m31000| ----- BEGIN BACKTRACE -----
 m31000| {"backtrace":[{"b":"400000","o":"B0E309"},{"b":"400000","o":"B0D9B2"},{"b":"400000","o":"B0DD36"},{"b":"2AFDEFAD3000","o":"30030"},{"b":"2AFDEFAD3000","o":"2FFC5"},{"b":"2AFDEFAD3000","o":"31A70"},{"b":"400000","o":"686A6C"},{"b":"400000","o":"8DE881"},{"b":"400000","o":"B5A614"},{"b":"2AFDEED17000","o":"683D"},{"b":"2AFDEFAD3000","o":"D4FCD"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc4-pre-", "gitVersion" : "0ae940b456ceafc3a0d473714c390f95d3a04823", "uname" : { "sysname" : "Linux", "release" : "2.6.18-194.el5xen", "version" : "#1 SMP Tue Mar 16 22:01:26 EDT 2010", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "2AFDEED17000", "path" : "/lib64/libpthread.so.0", "elfType" : 3 }, { "b" : "2AFDEEF33000", "path" : "/lib64/librt.so.1", "elfType" : 3 }, { "b" : "2AFDEF13C000", "path" : "/lib64/libdl.so.2", "elfType" : 3 }, { "b" : "2AFDEF341000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3 }, { "b" : "2AFDEF641000", "path" : "/lib64/libm.so.6", "elfType" : 3 }, { "b" : "2AFDEF8C4000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3 }, { "b" : "2AFDEFAD3000", "path" : "/lib64/libc.so.6", "elfType" : 3 }, { "b" : "2AFDEEAF9000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 m31000|  mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf0e309]
 m31000|  mongod(+0xB0D9B2) [0xf0d9b2]
 m31000|  mongod(+0xB0DD36) [0xf0dd36]
 m31000|  libc.so.6(+0x30030) [0x2afdefb03030]
 m31000|  libc.so.6(gsignal+0x35) [0x2afdefb02fc5]
 m31000|  libc.so.6(abort+0x110) [0x2afdefb04a70]
 m31000|  mongod(_ZN5mongo10mongoAbortEPKc+0x7C) [0xa86a6c]
 m31000|  mongod(+0x8DE881) [0xcde881]
 m31000|  mongod(+0xB5A614) [0xf5a614]
 m31000|  libpthread.so.0(+0x683D) [0x2afdeed1d83d]
 m31000|  libc.so.6(clone+0x6D) [0x2afdefba7fcd]
 m31000| -----  END BACKTRACE  -----
2014-12-21T17:08:53.194+0000 I -        shell: stopped mongo program on port 31000
2014-12-21T17:08:53.196+0000 I -        shell: started program (sh2017):  /data/mci/shell/src/mongod --nohttpinterface --noprealloc --bind_ip 127.0.0.1 --smallfiles --port 31000 --dbpath /data/db/jstests_repl_master1test-master --nopreallocj --keyFile /data/mci/shell/src/jstests/libs/authTestsKey --master --oplogSize 40 --setParameter enableTestCommands=1 --setParameter enableLocalhostAuthBypass=false
2014-12-21T17:08:53.197+0000 W NETWORK  Failed to connect to 127.0.0.1:31000, reason: errno:111 Connection refused
 m31000| note: noprealloc may hurt performance in many applications
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] MongoDB starting : pid=2017 port=31000 dbpath=/data/db/jstests_repl_master1test-master master=1 64-bit host=ip-10-179-175-138
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] 
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] db version v2.8.0-rc4-pre-
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] git version: 0ae940b456ceafc3a0d473714c390f95d3a04823
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] build info: Linux build13.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] allocator: tcmalloc
 m31000| 2014-12-21T17:08:53.229+0000 I CONTROL  [initandlisten] options: { master: true, net: { bindIp: "127.0.0.1", http: { enabled: false }, port: 31000 }, nopreallocj: true, replication: { oplogSizeMB: 40 }, security: { keyFile: "/data/mci/shell/src/jstests/libs/authTestsKey" }, setParameter: { enableLocalhostAuthBypass: "false", enableTestCommands: "1" }, storage: { dbPath: "/data/db/jstests_repl_master1test-master", mmapv1: { preallocDataFiles: false, smallFiles: true } } }
 m31000| 2014-12-21T17:08:53.245+0000 I JOURNAL  [initandlisten] journal dir=/data/db/jstests_repl_master1test-master/journal
 m31000| 2014-12-21T17:08:53.246+0000 I JOURNAL  [initandlisten] recover : no journal files present, no recovery needed
 m31000| 2014-12-21T17:08:53.268+0000 I NETWORK  [initandlisten] waiting for connections on port 31000
Authenticating as internal __system user with mechanism null on connection: connection to 127.0.0.1:31000
 m31000| 2014-12-21T17:08:53.399+0000 I NETWORK  [initandlisten] connection accepted from 127.0.0.1:56190 #1 (1 connection now open)
 m31000| 2014-12-21T17:08:53.413+0000 I ACCESS   [conn1] Successfully authenticated as principal __system on local
assert: [2147483647] != [2] are not equal : undefined
Error: [2147483647] != [2] are not equal : undefined
    at Error (<anonymous>)
    at doassert (src/mongo/shell/assert.js:11:14)
    at Function.assert.eq (src/mongo/shell/assert.js:38:5)
    at /data/mci/shell/src/jstests/repl/master1.js:45:8



 Comments   
Comment by Githook User [ 16/Jan/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-16688 Shutdown should wait for the durability thread to complete
Branch: master
https://github.com/mongodb/mongo/commit/a01d6e0e4fadb79d5628e76558178c6e005f7a35

Comment by Eric Milkie [ 23/Dec/14 ]

Ah, I see. There is a bug in DurableImpl::commitAndStopDurThread(). It's not synchronously waiting for the dur thread to exit.

Comment by Eric Milkie [ 23/Dec/14 ]

Would it help if in exitCleanly(), we take the flush lock? Right now we explicitly take just the global lock, by using lockGlobalBegin(), to avoid taking the flush lock. This leads me to suspect that it won't block the journal thread, and thus the journal thread can crash once the shutdown thread does journalCleanup.

Comment by Eric Milkie [ 23/Dec/14 ]

Same problem here: http://buildlogs.mongodb.org/MCI_linux-32-debug/builds/471202/test/durability_0/manyRestart.js

Can we block out the journal thread once we enter shutdown?

Comment by Eric Milkie [ 22/Dec/14 ]

Looks like the journal thread was still running (and not blocked) during shutdown.

Generated at Thu Feb 08 03:41:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.