Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18838

DB fails to recover creates and drops after system crash

    • Fully Compatible
    • ALL
    • Hide

      This scenario was using the power cycle test script, along with an FSM client. The FSM client, jstests/concurrency/fsm_all.js, provides more DB operations, besides the standard CRUD ones, like

      • create and drop databases
      • create and drop collections
      • create and drop indexes
      • rename collection

      To reproduce this scenario, perform the following operations:

      1. ./powertest.sh -s <ssh server host access> -l 20
        • i.e., ./powertest.sh -s "-i x.pem user@10.4.1.1" -l 20
        • Note user must have sudo privileges on server
      2. ./fsm.sh <server host>
        • i.e., ./fsm.sh 10.4.1.1
      Show
      This scenario was using the power cycle test script, along with an FSM client. The FSM client, jstests/concurrency/fsm_all.js, provides more DB operations, besides the standard CRUD ones, like create and drop databases create and drop collections create and drop indexes rename collection To reproduce this scenario, perform the following operations: ./powertest.sh -s <ssh server host access> -l 20 i.e., ./powertest.sh -s "-i x.pem user@10.4.1.1" -l 20 Note user must have sudo privileges on server ./fsm.sh <server host> i.e., ./fsm.sh 10.4.1.1

      Issue Status as of Jul 14, 2015

      ISSUE SUMMARY
      If interrupted by an unexpected server termination (e.g. power loss) certain operations, including database, collection, and index creation and deletion (i.e. drop), as well as collection renames may result in an invalid data set, causing MongoDB to crash immediately after starting. These operations include:

      • database creation
      • database dropping
      • collection creation
      • collection dropping
      • index creation
      • index dropping

      USER IMPACT
      Affected MongoDB instances may not be able to start, leading to loss of availability of the affected node.

      WORKAROUNDS
      There are no workarounds for this issue. If a node is affected, users can resync from a healthy member in the replica set.

      Users are urged to consider additional measures to prevent unclean shutdowns.

      AFFECTED VERSIONS
      MongoDB 3.0.0 through 3.0.4.

      FIX VERSION
      The fix is included in the 3.0.5 production release.

      Original description

      This recovery failure was found on both EXT4 & XFS file systems:

      2015-06-05T10:36:26.053-0400 I CONTROL  [thread1] ***** SERVER RESTARTED *****
      2015-06-05T10:36:26.130-0400 W -        [initandlisten] Detected unclean shutdown - /home/jonathan/mongodb-powertest/data/wiredTiger/mongod.lock is not empty.
      2015-06-05T10:36:26.130-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2015-06-05T10:36:26.130-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-06-05T10:36:26.376-0400 E STORAGE  [initandlisten] WiredTiger (2) [1433514986:376579][1528:0x7f5c0368cb80], file:collection-1-8498577547266750689.wt: /home/jonathan/mongodb-powertest/data/wiredTiger/collection-1-8498577547266750689.wt: No such file or directory
      2015-06-05T10:36:26.376-0400 E STORAGE  [initandlisten] WiredTiger (2) [1433514986:376641][1528:0x7f5c0368cb80], file:collection-1-8498577547266750689.wt: Operation failed during recovery: No such file or directory
      2015-06-05T10:36:26.947-0400 E STORAGE  [initandlisten] WiredTiger (2) [1433514986:947150][1528:0x7f5c0368cb80], file:collection-1-8498577547266750689.wt, WT_SESSION.open_cursor: /home/jonathan/mongodb-powertest/data/wiredTiger/collection-1-8498577547266750689.wt: No such file or directory
      2015-06-05T10:36:26.947-0400 E STORAGE  [initandlisten] no cursor for uri: table:collection-1-8498577547266750689
      2015-06-05T10:36:26.947-0400 I -        [initandlisten] Invariant failure c src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp 1047
      2015-06-05T10:36:27.149-0400 I CONTROL  [initandlisten]
       0xfdb0c6 0xf8fb48 0xf76826 0xdd5219 0xdd53cc 0xdd60f3 0xdd640f 0xdcf808 0xd3dab6 0xd42bee 0xdce3fc 0xd06ac6 0x80b3aa 0x811c34 0x7f5c02294ec5 0x809719
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"BDB0C6"},{"b":"400000","o":"B8FB48"},{"b":"400000","o":"B76826"},{"b":"400000","o":"9D5219"},{"b":"400000","o":"9D53CC"},{"b":"400000","o":"9D60F3"},{"b":"400000","o":"9D640F"},{"b":"400000","o":"9CF808"},{"b":"400000","o":"93DAB6"},{"b":"400000","o":"942BEE"},{"b":"400000","o":"9CE3FC"},{"b":"400000","o":"906AC6"},{"b":"400000","o":"40B3AA"},{"b":"400000","o":"411C34"},{"b":"7F5C02273000","o":"21EC5"},{"b":"400000","o":"409719"}],"processInfo":{ "mongodbVersion" : "3.1.4-pre-", "gitVersion" : "ce2b1d11d42de93f7b375f7e6c41fb709f66e969", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.16.0-31-generic", "version" : "#41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF36DEC000", "elfType" : 3 }, { "b" : "7F5C0327A000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F5C03076000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F5C02D72000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F5C02A6C000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F5C02856000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F5C02638000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F5C02273000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F5C03482000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xfdb0c6]
       mongod(_ZN5mongo10logContextEPKc+0x138) [0xf8fb48]
       mongod(_ZN5mongo15invariantFailedEPKcS1_j+0x96) [0xf76826]
       mongod(_ZN5mongo21WiredTigerRecordStore8Iterator7_locateERKNS_8RecordIdEb+0x1C9) [0xdd5219]
       mongod(_ZN5mongo21WiredTigerRecordStore8IteratorC2ERKS0_PNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionEb+0x9C) [0xdd53cc]
       mongod(_ZNK5mongo21WiredTigerRecordStore11getIteratorEPNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionE+0x43) [0xdd60f3]
       mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_bllPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x2CF) [0xdd640f]
       mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0x158) [0xdcf808]
       mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x1D6) [0xd3dab6]
       mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x61E) [0xd42bee]
       mongod(+0x9CE3FC) [0xdce3fc]
       mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x5E6) [0xd06ac6]
       mongod(_ZN5mongo13initAndListenEi+0x16A) [0x80b3aa]
       mongod(main+0x134) [0x811c34]
       libc.so.6(__libc_start_main+0xF5) [0x7f5c02294ec5]
       mongod(+0x409719) [0x809719]
      -----  END BACKTRACE  -----
      2015-06-05T10:36:27.149-0400 I -        [initandlisten]
      
      ***aborting after invariant() failure
      

      A different failure observed from a repeat of the scenario:

      2015-06-05T10:51:00.593-0400 I CONTROL  [thread1] ***** SERVER RESTARTED *****
      2015-06-05T10:51:00.672-0400 W -        [initandlisten] Detected unclean shutdown - /home/jonathan/pr-default-3/data/wiredTiger/mongod.lock is not empty.
      2015-06-05T10:51:00.672-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2015-06-05T10:51:00.672-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-06-05T10:51:00.962-0400 I -        [initandlisten] Fatal assertion 28548 NoSuchKey Unable to find metadata for table:collection-2--3916750969192802253
      2015-06-05T10:51:00.962-0400 I -        [initandlisten]
      
      ***aborting after fassert() failure
      

        1. fsm.sh
          0.2 kB
        2. powertest.sh
          25 kB
        3. mongodb-powertest.tar.gz
          39.37 MB
        4. pr-default-3.tar.gz
          42.43 MB

            Assignee:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Reporter:
            jonathan.abrahams Jonathan Abrahams
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: