Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-18316

Database with WT engine fails to recover after system crash

    Details

    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Completed:
    • Steps To Reproduce:
      Hide
      1. Start mongod with specified options on secret port (recovery mode)
        --storageEngine wiredTiger --wiredTigerJournalCompressor none --wiredTigerCollectionBlockCompressor none --wiredTigerIndexPrefixCompression 0
      2. If startup is successful, kill and restart on start port
      3. Run mongo shell to randomly perform CRUD operations
      4. Crash host system at a random period (40-50 seconds after step 2)
      5. After reboot start from step 1
      Show
      Start mongod with specified options on secret port (recovery mode) --storageEngine wiredTiger --wiredTigerJournalCompressor none --wiredTigerCollectionBlockCompressor none --wiredTigerIndexPrefixCompression 0 If startup is successful, kill and restart on start port Run mongo shell to randomly perform CRUD operations Crash host system at a random period (40-50 seconds after step 2) After reboot start from step 1

      Description

      Issue Status as of Jun 09, 2015

      ISSUE SUMMARY
      If a system running MongoDB with the WiredTiger storage engine crashes or experiences an unclean shutdown, MongoDB may not be able to recover data files on restart if the crash/shutdown interrupted a WiredTiger checkpoint.

      USER IMPACT
      MongoDB cannot automatically recover data files on restart.

      WORKAROUNDS
      None. Restore data from backups or resync from another replica set member.

      AFFECTED VERSIONS
      MongoDB 3.0 versions before 3.0.4.

      FIX VERSION
      The fix is included in the 3.0.4 production release.

      Original description

      DB recovery failed for WT when the following options were specified:
      --wiredTigerJournalCompressor none --wiredTigerCollectionBlockCompressor none --wiredTigerIndexPrefixCompression 0

      2015-05-04T16:28:54.660-0400 W -        [initandlisten] Detected unclean shutdown - /data/wiredTiger/mongod.lock is not empty.
      2015-05-04T16:28:54.660-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
      2015-05-04T16:28:54.661-0400 I STORAGE  [initandlisten] wiredtiger_open config:create,cache_size=1G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log(enabled=true,archive=true,path=journal,compressor=none),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
      2015-05-04T16:29:31.728-0400 E STORAGE  [initandlisten] WiredTiger (-31802) [1430771371:728175][1371:0x7f39f5c34b80], file:collection-0--5195437846436728971.wt, WT_SESSION.open_cursor: collection-0--5195437846436728971.wt read error: failed to read 4096 bytes at offset 40960: WT_ERROR: non-specific WiredTiger error
      2015-05-04T16:29:31.728-0400 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError-31802:WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 80
      2015-05-04T16:29:31.778-0400 I CONTROL  [initandlisten] 0xf4ba16 0xeee0f8 0xed1ee5 0xd6a66d 0xd65a5b 0xd5f824 0xd5ff13 0xd60216 0xd5a404 0xcdbf86 0xcdffa9 0xd5940c 0xcaa860 0x8291ce 0x7ed989 0x7f39f4205ec5 0x8272f7
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"400000","o":"B4BA16"},{"b":"400000","o":"AEE0F8"},{"b":"400000","o":"AD1EE5"},{"b":"400000","o":"96A66D"},{"b":"400000","o":"965A5B"},{"b":"400000","o":"95F824"},{"b":"400000","o":"95FF13"},{"b":"400000","o":"960216"},{"b":"400000","o":"95A404"},{"b":"400000","o":"8DBF86"},{"b":"400000","o":"8DFFA9"},{"b":"400000","o":"95940C"},{"b":"400000","o":"8AA860"},{"b":"400000","o":"4291CE"},{"b":"400000","o":"3ED989"},{"b":"7F39F41E4000","o":"21EC5"},{"b":"400000","o":"4272F7"}],"processInfo":{ "mongodbVersion" : "3.1.2", "gitVersion" : "aa0066050f0a9db81aa47181d0fbd18c109ae991", "uname" : { "
      sysname" : "Linux", "release" : "3.16.0-31-generic", "version" : "#41~14.04.1-Ubuntu SMP Wed Feb 11 19:30:13 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "5A1AB53E82C3717157B58AE3409311D6B24478D5" }, { "b" : "7FFFFE5DB000", "elfType" : 3, "buildId" : "5552B9335DDE9349419BA10896C1E75C9432A946" }, { "b" : "7F39F57CD000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "FF43D0947510134A8A494063A3C1CF3CEBB27791" }, { "b" : "7F39F53F3000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "379F80D2768BA6A21F52781895EE9F47B34A0A85" }, { "b" : "7F39F51EB000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F39F4FE7000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F39F4CE3000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "19EFDDAB11B3BF5C71570078C59F91CF6592CE9E" }, { "b" : "7F39F49DD000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F39F47C7000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F39F45A9000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F39F41E4000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F39F5A2B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x26) [0xf4ba16]
       mongod(_ZN5mongo10logContextEPKc+0x138) [0xeee0f8]
       mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xA5) [0xed1ee5]
       mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0x1FD) [0xd6a66d]
       mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x4B) [0xd65a5b]
       mongod(_ZN5mongo21WiredTigerRecordStore8IteratorC2ERKS0_PNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionEb+0x64) [0xd5f824]
       mongod(_ZNK5mongo21WiredTigerRecordStore11getIteratorEPNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionE+0x43) [0xd5ff13]
       mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_bllPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x2B6) [0xd60216]
       mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0xC4) [0xd5a404]
       mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x1D6) [0xcdbf86]
       mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x579) [0xcdffa9]
       mongod(+0x95940C) [0xd5940c]
       mongod(_ZN5mongo20ServiceContextMongoD22setGlobalStorageEngineERKSs+0x350) [0xcaa860]
       mongod(_ZN5mongo13initAndListenEi+0x3EE) [0x8291ce]
       mongod(main+0x139) [0x7ed989]
       libc.so.6(__libc_start_main+0xF5) [0x7f39f4205ec5]
       mongod(+0x4272F7) [0x8272f7]
      -----  END BACKTRACE  -----
      2015-05-04T16:29:31.778-0400 I -        [initandlisten]
       
      ***aborting after invariant() failure
      

      1. crashserver.sh
        0.9 kB
        Jonathan Abrahams
      2. crashserver.sh.gz
        0.5 kB
        Jonathan Abrahams
      3. mongod-wiredTiger.log
        4.70 MB
        Jonathan Abrahams
      4. mongod-wiredTiger.log
        12.96 MB
        Jonathan Abrahams
      5. mongod-wiredTiger.log.gz
        102 kB
        Jonathan Abrahams
      6. mongod-wiredTiger.log.gz
        1.27 MB
        Jonathan Abrahams
      7. mongod-wiredTiger-recovery.log
        5.36 MB
        Jonathan Abrahams
      8. mongod-wiredTiger-recovery.log
        6.96 MB
        Jonathan Abrahams
      9. mongod-wiredTiger-recovery.log.gz
        110 kB
        Jonathan Abrahams
      10. mongod-wiredTiger-recovery.log.gz
        2.65 MB
        Jonathan Abrahams
      11. powertest.sh
        11 kB
        Jonathan Abrahams
      12. powertest2.sh
        11 kB
        Alexander Gorrod
      13. SERVER-18316.diff
        2 kB
        Michael Cahill
      14. startmongo.sh
        2 kB
        Jonathan Abrahams
      15. startmongo.sh.gz
        0.7 kB
        Jonathan Abrahams
      16. wiredTiger.tar.gz
        15.51 MB
        Jonathan Abrahams
      17. wiredTiger.tar.gz
        19.66 MB
        Jonathan Abrahams
      18. wiredTiger.tar.gz
        16.71 MB
        Jonathan Abrahams
      19. wiredTiger-afterrecovery.tar.gz
        4.90 MB
        Jonathan Abrahams
      20. wiredTiger-afterrecovery.tar.gz
        9.45 MB
        Jonathan Abrahams
      21. wiredTiger-afterrecovery.tar.gz
        5.00 MB
        Jonathan Abrahams
      22. wiredTiger-beforerecovery.tar.gz
        15.51 MB
        Jonathan Abrahams
      23. wiredTiger-beforerecovery.tar.gz
        19.65 MB
        Jonathan Abrahams
      24. wiredTiger-beforerecovery.tar.gz
        119.51 MB
        Jonathan Abrahams
      25. wiredTiger-beforerecovery.tar.gz.split1
        95.37 MB
        Jonathan Abrahams
      26. wiredTiger-beforerecovery.tar.gz.split2
        58.64 MB
        Jonathan Abrahams
      27. wiredTiger-failedrecovery.tar.gz
        10.31 MB
        Jonathan Abrahams
      28. wiredTiger-firstrun.tar.gz
        5 kB
        Jonathan Abrahams
      29. wiredTiger-firstrun.tar.gz
        5 kB
        Jonathan Abrahams
      30. wiredTiger-firstrun.tar.gz
        6 kB
        Jonathan Abrahams
      31. wiredTiger-firstrun.tar.gz
        5 kB
        Jonathan Abrahams

        Issue Links

          Activity

          Hide
          jonathan.abrahams Jonathan Abrahams added a comment - - edited

          Tested in the following configurations:

          • --wiredTigerIndexPrefixCompression 0, there was no crash
          • --wiredTigerJournalCompressor none, there was a crash
          • --wiredTigerCollectionBlockCompressor none, there was no crash
          Show
          jonathan.abrahams Jonathan Abrahams added a comment - - edited Tested in the following configurations: --wiredTigerIndexPrefixCompression 0, there was no crash --wiredTigerJournalCompressor none, there was a crash --wiredTigerCollectionBlockCompressor none, there was no crash
          Hide
          jonathan.abrahams Jonathan Abrahams added a comment - - edited

          Attaching another set of logs & DBs from crash:

          • mongod-wiredTiger-recovery.log
          • mongod-wiredTiger.log
          • wiredTiger-afterrecovery.tar.gz
          • wiredTiger-beforerecovery.tar.gz.split1
          • wiredTiger-beforerecovery.tar.gz.split2
          • wiredTiger-firstrun.tar.gz
          • wiredTiger.tar.gz

          cat wiredTiger-beforerecovery.tar.gz.split1 wiredTiger-beforerecovery.tar.gz.split2 > wiredTiger-beforerecovery.tar.gz

          Show
          jonathan.abrahams Jonathan Abrahams added a comment - - edited Attaching another set of logs & DBs from crash: mongod-wiredTiger-recovery.log mongod-wiredTiger.log wiredTiger-afterrecovery.tar.gz wiredTiger-beforerecovery.tar.gz.split1 wiredTiger-beforerecovery.tar.gz.split2 wiredTiger-firstrun.tar.gz wiredTiger.tar.gz cat wiredTiger-beforerecovery.tar.gz.split1 wiredTiger-beforerecovery.tar.gz.split2 > wiredTiger-beforerecovery.tar.gz
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'}

          Message: Sync the metadata file before updating turtle file. SERVER-18316
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/1880ed81fe765aed24f7e328388ec739dd1ee3e3

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'sueloverso', u'name': u'Susan LoVerso', u'email': u'sue@wiredtiger.com'} Message: Sync the metadata file before updating turtle file. SERVER-18316 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/1880ed81fe765aed24f7e328388ec739dd1ee3e3
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: Merge pull request #1957 from wiredtiger/checkpoint-sync

          Sync the metadata file before updating turtle file. SERVER-18316
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/df5a23d353d141afcbdb6cdced0538c87e99d389

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: Merge pull request #1957 from wiredtiger/checkpoint-sync Sync the metadata file before updating turtle file. SERVER-18316 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/df5a23d353d141afcbdb6cdced0538c87e99d389
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'}

          Message: Fix a bug in checkpoint where we committed the transaction too early.

          It's not safe to commit the transaction before the files have been
          synced to disk - if we do that the checkpoint can be rolled forward
          in recovery before the checkpoint in the file has been updated.

          Refs SERVER-18316
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/5ebdc72ca013ead51d7eb8e27eb8cb2d8090daf0

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexg@wiredtiger.com'} Message: Fix a bug in checkpoint where we committed the transaction too early. It's not safe to commit the transaction before the files have been synced to disk - if we do that the checkpoint can be rolled forward in recovery before the checkpoint in the file has been updated. Refs SERVER-18316 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/5ebdc72ca013ead51d7eb8e27eb8cb2d8090daf0
          Hide
          xgen-internal-githook Githook User added a comment -

          Author:

          {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'}

          Message: If we're closing a dirty tree in order to verify it (or any other exclusive operation), make sure the changes are stable on disk in case there is a crash.

          refs SERVER-18316
          Branch: develop
          https://github.com/wiredtiger/wiredtiger/commit/9bf96e1fcb16835c2feb51200bee43e5482c7d5b

          Show
          xgen-internal-githook Githook User added a comment - Author: {u'username': u'michaelcahill', u'name': u'Michael Cahill', u'email': u'michael.cahill@mongodb.com'} Message: If we're closing a dirty tree in order to verify it (or any other exclusive operation), make sure the changes are stable on disk in case there is a crash. refs SERVER-18316 Branch: develop https://github.com/wiredtiger/wiredtiger/commit/9bf96e1fcb16835c2feb51200bee43e5482c7d5b

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: