Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
2.4.9
-
None
-
ALL
Description
I have a sharded cluster with two replica sets; each replica set has a primary, secondary, and arbiter. During some chunk migrations, the secondary crashed with an invalid memory access:
Thu Mar 20 00:11:15.400 Invalid access at address: 0x7e82cb2c4024 from thread: repl writer worker 1
|
Thu Mar 20 00:11:15.400 Got signal: 7 (Bus error).
|
Thu Mar 20 00:11:15.470 Backtrace:
|
0xde46e1 0x6d06c9 0x6d0c52 0x7f18e429dcb0 0x7feb9b 0x804610 0x80474f 0x80474f 0x80484c 0x8051fd 0x9db2e3 0x9e2944 0x9e45cc 0xacaa1f 0xacb4b2 0xa80bdb 0xa71707 0xc2d253 0xc2c998 0xdb2021
|
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
|
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d06c9]
|
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d0c52]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f18e429dcb0]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE4findERKNS_12IndexDetailsERKNS_5KeyV1ERKNS_7DiskLocERKNS_8OrderingERib+0x8b) [0x7feb9b]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x60) [0x804610]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_7BSONObjERKNS_8OrderingERiRbS8_i+0x8c) [0x80484c]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x51d) [0x8051fd]
|
/usr/bin/mongod(_ZNK5mongo18IndexInterfaceImplINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x63) [0x9db2e3]
|
/usr/bin/mongod() [0x9e2944]
|
/usr/bin/mongod(_ZN5mongo13unindexRecordEPNS_16NamespaceDetailsEPNS_6RecordERKNS_7DiskLocEb+0x7c) [0x9e45cc]
|
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPNS_16NamespaceDetailsEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x1bf) [0xacaa1f]
|
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x82) [0xacb4b2]
|
/usr/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0xd6b) [0xa80bdb]
|
/usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0xb37) [0xa71707]
|
/usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc2d253]
|
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc2c998]
|
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdb2021]
|
I then try to restart the server process, but each time I do, it crashes again with the same error:
***** SERVER RESTARTED *****
|
|
|
|
|
Thu Mar 20 00:16:40.792 [initandlisten] MongoDB starting : pid=12850 port=27017 dbpath=/mnt/ssd0/mongodb 64-bit host=ip-172-31-15-97
|
Thu Mar 20 00:16:40.792 [initandlisten] db version v2.4.9
|
Thu Mar 20 00:16:40.792 [initandlisten] git version: 52fe0d21959e32a5bdbecdc62057db386e4e029c
|
Thu Mar 20 00:16:40.792 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
|
Thu Mar 20 00:16:40.792 [initandlisten] allocator: tcmalloc
|
Thu Mar 20 00:16:40.792 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/mnt/ssd0/mongodb", keyFile: "/var/lib/mongodb/key", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", replSet: "c10r_rs0" }
|
Thu Mar 20 00:16:40.794 [initandlisten] journal dir=/mnt/ssd0/mongodb/journal
|
Thu Mar 20 00:16:40.804 [initandlisten] recover begin
|
Thu Mar 20 00:16:40.804 [initandlisten] recover lsn: 67446000
|
Thu Mar 20 00:16:40.804 [initandlisten] recover /mnt/ssd0/mongodb/journal/j._223
|
Thu Mar 20 00:16:40.805 [initandlisten] recover skipping application of section seq:67267936 < lsn:67446000
|
Thu Mar 20 00:16:40.971 [initandlisten] recover skipping application of section seq:67327460 < lsn:67446000
|
Thu Mar 20 00:16:41.378 [initandlisten] recover skipping application of section seq:67386642 < lsn:67446000
|
Thu Mar 20 00:16:41.614 [initandlisten] recover /mnt/ssd0/mongodb/journal/j._224
|
Thu Mar 20 00:16:42.117 [initandlisten] recover cleaning up
|
Thu Mar 20 00:16:42.117 [initandlisten] removeJournalFiles
|
Thu Mar 20 00:16:42.175 [initandlisten] recover done
|
Thu Mar 20 00:16:42.242 [websvr] admin web console waiting for connections on port 28017
|
Thu Mar 20 00:16:42.242 [initandlisten] waiting for connections on port 27017
|
Thu Mar 20 00:16:42.262 [rsStart] replSet I am ip-172-31-15-97:27017
|
Thu Mar 20 00:16:42.263 [rsStart] replSet STARTUP2
|
Thu Mar 20 00:16:42.264 [rsHealthPoll] replset info ip-172-31-7-105:27017 thinks that we are down
|
Thu Mar 20 00:16:42.264 [rsHealthPoll] replSet member ip-172-31-7-105:27017 is up
|
Thu Mar 20 00:16:42.264 [rsHealthPoll] replSet member ip-172-31-7-105:27017 is now in state PRIMARY
|
Thu Mar 20 00:16:42.267 [initandlisten] connection accepted from 172.31.5.87:58346 #1 (1 connection now open)
|
Thu Mar 20 00:16:42.268 [conn1] authenticate db: local { authenticate: 1, nonce: "abb7f6424d81c71f", user: "__system", key: "dfb68b42dcbb35c1da05e479817db475" }
|
Thu Mar 20 00:16:43.264 [rsSync] replSet SECONDARY
|
Thu Mar 20 00:16:43.672 [initandlisten] connection accepted from 172.31.5.87:58347 #2 (2 connections now open)
|
Thu Mar 20 00:16:43.842 [initandlisten] connection accepted from 172.31.7.105:49017 #3 (3 connections now open)
|
Thu Mar 20 00:16:43.842 [conn3] end connection 172.31.7.105:49017 (2 connections now open)
|
Thu Mar 20 00:16:43.842 [initandlisten] connection accepted from 172.31.7.105:49018 #4 (4 connections now open)
|
Thu Mar 20 00:16:43.843 [conn4] authenticate db: local { authenticate: 1, nonce: "370747b8d626ae6b", user: "__system", key: "1621b74845db8066a8f1146b7d3c4e20" }
|
Thu Mar 20 00:16:43.843 [conn4] authenticate db: local { authenticate: 1, nonce: "a5af772b216170bf", user: "__system", key: "785eadb9503bd9bf9cea035bfad12f6a" }
|
Thu Mar 20 00:16:44.085 [initandlisten] connection accepted from 172.31.0.228:41168 #5 (4 connections now open)
|
Thu Mar 20 00:16:44.086 [conn5] end connection 172.31.0.228:41168 (3 connections now open)
|
Thu Mar 20 00:16:44.086 [initandlisten] connection accepted from 172.31.0.228:41169 #6 (4 connections now open)
|
Thu Mar 20 00:16:44.087 [conn6] authenticate db: local { authenticate: 1, nonce: "3f8ac5c819eeb53f", user: "__system", key: "3101d50913a97466cde4cbed821f5ca8" }
|
Thu Mar 20 00:16:44.087 [conn6] authenticate db: local { authenticate: 1, nonce: "57f591267cfa5d23", user: "__system", key: "a673e04f54b4c46c5c40aeaa94e74452" }
|
Thu Mar 20 00:16:44.265 [rsHealthPoll] replSet member ip-172-31-0-228:27017 is up
|
Thu Mar 20 00:16:44.265 [rsHealthPoll] replSet member ip-172-31-0-228:27017 is now in state ARBITER
|
Thu Mar 20 00:16:45.547 [initandlisten] connection accepted from 172.31.8.61:42866 #7 (5 connections now open)
|
Thu Mar 20 00:16:49.264 [rsBackgroundSync] replSet syncing to: ip-172-31-7-105:27017
|
Thu Mar 20 00:16:49.395 Invalid access at address: 0x7eacf0936024 from thread: repl writer worker 1
|
Thu Mar 20 00:16:49.395 Got signal: 7 (Bus error).
|
Thu Mar 20 00:16:49.402 Backtrace:
|
0xde46e1 0x6d06c9 0x6d0c52 0x7f432b6f6cb0 0x7feb9b 0x804610 0x80474f 0x80474f 0x80484c 0x8051fd 0x9db2e3 0x9e2944 0x9e45cc 0xacaa1f 0xacb4b2 0xa80bdb 0xa71707 0xc2d253 0xc2c998 0xdb2021
|
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
|
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d06c9]
|
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d0c52]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f432b6f6cb0]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE4findERKNS_12IndexDetailsERKNS_5KeyV1ERKNS_7DiskLocERKNS_8OrderingERib+0x8b) [0x7feb9b]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x60) [0x804610]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_7BSONObjERKNS_8OrderingERiRbS8_i+0x8c) [0x80484c]
|
/usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x51d) [0x8051fd]
|
/usr/bin/mongod(_ZNK5mongo18IndexInterfaceImplINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x63) [0x9db2e3]
|
/usr/bin/mongod() [0x9e2944]
|
/usr/bin/mongod(_ZN5mongo13unindexRecordEPNS_16NamespaceDetailsEPNS_6RecordERKNS_7DiskLocEb+0x7c) [0x9e45cc]
|
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPNS_16NamespaceDetailsEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x1bf) [0xacaa1f]
|
/usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x82) [0xacb4b2]
|
/usr/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0xd6b) [0xa80bdb]
|
/usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0xb37) [0xa71707]
|
/usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc2d253]
|
/usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc2c998]
|
/usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdb2021]
|
|
|
I then attempted to run the server with --repair, but I got this message:
***** SERVER RESTARTED *****
|
|
|
|
|
Thu Mar 20 00:19:45.045 [initandlisten] MongoDB starting : pid=13064 port=27017 dbpath=/mnt/ssd0/m
|
ongodb 64-bit host=ip-172-31-15-97
|
Thu Mar 20 00:19:45.046 [initandlisten] db version v2.4.9
|
Thu Mar 20 00:19:45.046 [initandlisten] git version: 52fe0d21959e32a5bdbecdc62057db386e4e029c
|
Thu Mar 20 00:19:45.046 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen
|
#1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
|
Thu Mar 20 00:19:45.046 [initandlisten] allocator: tcmalloc
|
Thu Mar 20 00:19:45.046 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/mnt/ssd0
|
/mongodb", keyFile: "/var/lib/mongodb/key", logappend: "true", logpath: "/var/log/mongodb/mongodb.
|
log", repair: true, replSet: "c10r_rs0" }
|
**************
|
You specified --repair but there are dirty journal files. Please
|
restart without --repair to allow the journal files to be replayed.
|
If you wish to repair all databases, please shutdown cleanly and
|
run with --repair again.
|
**************
|
Thu Mar 20 00:19:45.046 [initandlisten] exception in initAndListen: 12596 old lock file, terminating
|
Thu Mar 20 00:19:45.046 dbexit:
|
Thu Mar 20 00:19:45.046 [initandlisten] shutdown: going to close listening sockets...
|
Thu Mar 20 00:19:45.046 [initandlisten] shutdown: going to flush diaglog...
|
Thu Mar 20 00:19:45.046 [initandlisten] shutdown: going to close sockets...
|
Thu Mar 20 00:19:45.046 [initandlisten] shutdown: waiting for fs preallocator...
|
Thu Mar 20 00:19:45.046 [initandlisten] shutdown: closing all files...
|
Thu Mar 20 00:19:45.046 [initandlisten] closeAllFiles() finished
|
Thu Mar 20 00:19:45.046 dbexit: really exiting now
|
I was forced to erase all the data from the data directory and have the secondary do a complete re-sync.