Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-13291

Invalid memory access causes crash and leaves server in unrecoverable state

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major - P3 Major - P3
    • None
    • 2.4.9
    • Replication
    • None
    • ALL

    Description

      I have a sharded cluster with two replica sets; each replica set has a primary, secondary, and arbiter. During some chunk migrations, the secondary crashed with an invalid memory access:

      Thu Mar 20 00:11:15.400 Invalid access at address: 0x7e82cb2c4024 from thread: repl writer worker 1
      Thu Mar 20 00:11:15.400 Got signal: 7 (Bus error).
      Thu Mar 20 00:11:15.470 Backtrace:
      0xde46e1 0x6d06c9 0x6d0c52 0x7f18e429dcb0 0x7feb9b 0x804610 0x80474f 0x80474f 0x80484c 0x8051fd 0x9db2e3 0x9e2944 0x9e45cc 0xacaa1f 0xacb4b2 0xa80bdb 0xa71707 0xc2d253 0xc2c998 0xdb2021
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
       /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d06c9]
       /usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d0c52]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f18e429dcb0]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE4findERKNS_12IndexDetailsERKNS_5KeyV1ERKNS_7DiskLocERKNS_8OrderingERib+0x8b) [0x7feb9b]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x60) [0x804610]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_7BSONObjERKNS_8OrderingERiRbS8_i+0x8c) [0x80484c]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x51d) [0x8051fd]
       /usr/bin/mongod(_ZNK5mongo18IndexInterfaceImplINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x63) [0x9db2e3]
       /usr/bin/mongod() [0x9e2944]
       /usr/bin/mongod(_ZN5mongo13unindexRecordEPNS_16NamespaceDetailsEPNS_6RecordERKNS_7DiskLocEb+0x7c) [0x9e45cc]
       /usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPNS_16NamespaceDetailsEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x1bf) [0xacaa1f]
       /usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x82) [0xacb4b2]
       /usr/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0xd6b) [0xa80bdb]
       /usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0xb37) [0xa71707]
       /usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc2d253]
       /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc2c998]
       /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdb2021]

      I then try to restart the server process, but each time I do, it crashes again with the same error:

      ***** SERVER RESTARTED *****
       
       
      Thu Mar 20 00:16:40.792 [initandlisten] MongoDB starting : pid=12850 port=27017 dbpath=/mnt/ssd0/mongodb 64-bit host=ip-172-31-15-97
      Thu Mar 20 00:16:40.792 [initandlisten] db version v2.4.9
      Thu Mar 20 00:16:40.792 [initandlisten] git version: 52fe0d21959e32a5bdbecdc62057db386e4e029c
      Thu Mar 20 00:16:40.792 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
      Thu Mar 20 00:16:40.792 [initandlisten] allocator: tcmalloc
      Thu Mar 20 00:16:40.792 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/mnt/ssd0/mongodb", keyFile: "/var/lib/mongodb/key", logappend: "true", logpath: "/var/log/mongodb/mongodb.log", replSet: "c10r_rs0" }
      Thu Mar 20 00:16:40.794 [initandlisten] journal dir=/mnt/ssd0/mongodb/journal
      Thu Mar 20 00:16:40.804 [initandlisten] recover begin
      Thu Mar 20 00:16:40.804 [initandlisten] recover lsn: 67446000
      Thu Mar 20 00:16:40.804 [initandlisten] recover /mnt/ssd0/mongodb/journal/j._223
      Thu Mar 20 00:16:40.805 [initandlisten] recover skipping application of section seq:67267936 < lsn:67446000
      Thu Mar 20 00:16:40.971 [initandlisten] recover skipping application of section seq:67327460 < lsn:67446000
      Thu Mar 20 00:16:41.378 [initandlisten] recover skipping application of section seq:67386642 < lsn:67446000
      Thu Mar 20 00:16:41.614 [initandlisten] recover /mnt/ssd0/mongodb/journal/j._224
      Thu Mar 20 00:16:42.117 [initandlisten] recover cleaning up
      Thu Mar 20 00:16:42.117 [initandlisten] removeJournalFiles
      Thu Mar 20 00:16:42.175 [initandlisten] recover done
      Thu Mar 20 00:16:42.242 [websvr] admin web console waiting for connections on port 28017
      Thu Mar 20 00:16:42.242 [initandlisten] waiting for connections on port 27017
      Thu Mar 20 00:16:42.262 [rsStart] replSet I am ip-172-31-15-97:27017
      Thu Mar 20 00:16:42.263 [rsStart] replSet STARTUP2
      Thu Mar 20 00:16:42.264 [rsHealthPoll] replset info ip-172-31-7-105:27017 thinks that we are down
      Thu Mar 20 00:16:42.264 [rsHealthPoll] replSet member ip-172-31-7-105:27017 is up
      Thu Mar 20 00:16:42.264 [rsHealthPoll] replSet member ip-172-31-7-105:27017 is now in state PRIMARY
      Thu Mar 20 00:16:42.267 [initandlisten] connection accepted from 172.31.5.87:58346 #1 (1 connection now open)
      Thu Mar 20 00:16:42.268 [conn1]  authenticate db: local { authenticate: 1, nonce: "abb7f6424d81c71f", user: "__system", key: "dfb68b42dcbb35c1da05e479817db475" }
      Thu Mar 20 00:16:43.264 [rsSync] replSet SECONDARY
      Thu Mar 20 00:16:43.672 [initandlisten] connection accepted from 172.31.5.87:58347 #2 (2 connections now open)
      Thu Mar 20 00:16:43.842 [initandlisten] connection accepted from 172.31.7.105:49017 #3 (3 connections now open)
      Thu Mar 20 00:16:43.842 [conn3] end connection 172.31.7.105:49017 (2 connections now open)
      Thu Mar 20 00:16:43.842 [initandlisten] connection accepted from 172.31.7.105:49018 #4 (4 connections now open)
      Thu Mar 20 00:16:43.843 [conn4]  authenticate db: local { authenticate: 1, nonce: "370747b8d626ae6b", user: "__system", key: "1621b74845db8066a8f1146b7d3c4e20" }
      Thu Mar 20 00:16:43.843 [conn4]  authenticate db: local { authenticate: 1, nonce: "a5af772b216170bf", user: "__system", key: "785eadb9503bd9bf9cea035bfad12f6a" }
      Thu Mar 20 00:16:44.085 [initandlisten] connection accepted from 172.31.0.228:41168 #5 (4 connections now open)
      Thu Mar 20 00:16:44.086 [conn5] end connection 172.31.0.228:41168 (3 connections now open)
      Thu Mar 20 00:16:44.086 [initandlisten] connection accepted from 172.31.0.228:41169 #6 (4 connections now open)
      Thu Mar 20 00:16:44.087 [conn6]  authenticate db: local { authenticate: 1, nonce: "3f8ac5c819eeb53f", user: "__system", key: "3101d50913a97466cde4cbed821f5ca8" }
      Thu Mar 20 00:16:44.087 [conn6]  authenticate db: local { authenticate: 1, nonce: "57f591267cfa5d23", user: "__system", key: "a673e04f54b4c46c5c40aeaa94e74452" }
      Thu Mar 20 00:16:44.265 [rsHealthPoll] replSet member ip-172-31-0-228:27017 is up
      Thu Mar 20 00:16:44.265 [rsHealthPoll] replSet member ip-172-31-0-228:27017 is now in state ARBITER
      Thu Mar 20 00:16:45.547 [initandlisten] connection accepted from 172.31.8.61:42866 #7 (5 connections now open)
      Thu Mar 20 00:16:49.264 [rsBackgroundSync] replSet syncing to: ip-172-31-7-105:27017
      Thu Mar 20 00:16:49.395 Invalid access at address: 0x7eacf0936024 from thread: repl writer worker 1
      Thu Mar 20 00:16:49.395 Got signal: 7 (Bus error).
      Thu Mar 20 00:16:49.402 Backtrace:
      0xde46e1 0x6d06c9 0x6d0c52 0x7f432b6f6cb0 0x7feb9b 0x804610 0x80474f 0x80474f 0x80484c 0x8051fd 0x9db2e3 0x9e2944 0x9e45cc 0xacaa1f 0xacb4b2 0xa80bdb 0xa71707 0xc2d253 0xc2c998 0xdb2021
       /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde46e1]
       /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d06c9]
       /usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d0c52]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0xfcb0) [0x7f432b6f6cb0]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE4findERKNS_12IndexDetailsERKNS_5KeyV1ERKNS_7DiskLocERKNS_8OrderingERib+0x8b) [0x7feb9b]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x60) [0x804610]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_5KeyV1ERKNS_8OrderingERiRbS8_i+0x19f) [0x80474f]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE6locateERKNS_12IndexDetailsERKNS_7DiskLocERKNS_7BSONObjERKNS_8OrderingERiRbS8_i+0x8c) [0x80484c]
       /usr/bin/mongod(_ZNK5mongo11BtreeBucketINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x51d) [0x8051fd]
       /usr/bin/mongod(_ZNK5mongo18IndexInterfaceImplINS_12BtreeData_V1EE7unindexENS_7DiskLocERNS_12IndexDetailsERKNS_7BSONObjES3_+0x63) [0x9db2e3]
       /usr/bin/mongod() [0x9e2944]
       /usr/bin/mongod(_ZN5mongo13unindexRecordEPNS_16NamespaceDetailsEPNS_6RecordERKNS_7DiskLocEb+0x7c) [0x9e45cc]
       /usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPNS_16NamespaceDetailsEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x1bf) [0xacaa1f]
       /usr/bin/mongod(_ZN5mongo11DataFileMgr12deleteRecordEPKcPNS_6RecordERKNS_7DiskLocEbbb+0x82) [0xacb4b2]
       /usr/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0xd6b) [0xa80bdb]
       /usr/bin/mongod(_ZN5mongo21applyOperation_inlockERKNS_7BSONObjEbb+0xb37) [0xa71707]
       /usr/bin/mongod(_ZN5mongo7replset8SyncTail9syncApplyERKNS_7BSONObjEb+0x713) [0xc2d253]
       /usr/bin/mongod(_ZN5mongo7replset14multiSyncApplyERKSt6vectorINS_7BSONObjESaIS2_EEPNS0_8SyncTailE+0x48) [0xc2c998]
       /usr/bin/mongod(_ZN5mongo10threadpool6Worker4loopEv+0x281) [0xdb2021]
       

      I then attempted to run the server with --repair, but I got this message:

      ***** SERVER RESTARTED *****
       
       
      Thu Mar 20 00:19:45.045 [initandlisten] MongoDB starting : pid=13064 port=27017 dbpath=/mnt/ssd0/m
      ongodb 64-bit host=ip-172-31-15-97
      Thu Mar 20 00:19:45.046 [initandlisten] db version v2.4.9
      Thu Mar 20 00:19:45.046 [initandlisten] git version: 52fe0d21959e32a5bdbecdc62057db386e4e029c
      Thu Mar 20 00:19:45.046 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen
       #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
      Thu Mar 20 00:19:45.046 [initandlisten] allocator: tcmalloc
      Thu Mar 20 00:19:45.046 [initandlisten] options: { config: "/etc/mongodb.conf", dbpath: "/mnt/ssd0
      /mongodb", keyFile: "/var/lib/mongodb/key", logappend: "true", logpath: "/var/log/mongodb/mongodb.
      log", repair: true, replSet: "c10r_rs0" }
      **************
      You specified --repair but there are dirty journal files. Please
      restart without --repair to allow the journal files to be replayed.
      If you wish to repair all databases, please shutdown cleanly and
      run with --repair again.
      **************
      Thu Mar 20 00:19:45.046 [initandlisten] exception in initAndListen: 12596 old lock file, terminating
      Thu Mar 20 00:19:45.046 dbexit:
      Thu Mar 20 00:19:45.046 [initandlisten] shutdown: going to close listening sockets...
      Thu Mar 20 00:19:45.046 [initandlisten] shutdown: going to flush diaglog...
      Thu Mar 20 00:19:45.046 [initandlisten] shutdown: going to close sockets...
      Thu Mar 20 00:19:45.046 [initandlisten] shutdown: waiting for fs preallocator...
      Thu Mar 20 00:19:45.046 [initandlisten] shutdown: closing all files...
      Thu Mar 20 00:19:45.046 [initandlisten] closeAllFiles() finished
      Thu Mar 20 00:19:45.046 dbexit: really exiting now

      I was forced to erase all the data from the data directory and have the secondary do a complete re-sync.

      Attachments

        Activity

          People

            ramon.fernandez@mongodb.com Ramon Fernandez Marina
            farooq@connectifier.com Farooq Mela
            Votes:
            4 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: