Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57515

DB recovery failing

    XMLWordPrintable

    Details

    • Type: Question
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Done
    • Affects Version/s: 4.4.0
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Our database crashed and recovery is not working. The recovery keeps cycling through these iterations and continuously generates files in the journal directory. These journal files dont exist before the crash date.

       

      Here's the relevant part of the log when the crash and recovery started:

      {"t":{"$date":"2021-05-26T04:15:58.478+00:00"},"s":"I",  "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****"}
      {"t":{"$date":"2021-05-26T04:15:59.111+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
      {"t":{"$date":"2021-05-26T04:16:01.723+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
      {"t":{"$date":"2021-05-26T04:16:01.723+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
      {"t":{"$date":"2021-05-26T04:16:01.748+00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"36bd19ba086f"}}
      {"t":{"$date":"2021-05-26T04:16:04.110+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.3","gitVersion":"913d6b62acfbb344dde1b116f4161360acd8fd13","openSSLVersion":"OpenSSL 1.1.1  11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}}
      {"t":{"$date":"2021-05-26T04:16:04.110+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}}
      {"t":{"$date":"2021-05-26T04:16:04.110+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/etc/mongod.conf","net":{"bindIp":"*","port":27017},"operationProfiling":{"slowOpThresholdMs":500},"processManagement":{"timeZoneInfo":"/usr/share/zoneinfo"},"storage":{"dbPath":"/data/db","journal":{"enabled":true}},"systemLog":{"destination":"file","logAppend":true,"path":"/var/log/mongod.log"}}}}
      {"t":{"$date":"2021-05-26T04:16:04.206+00:00"},"s":"W",  "c":"STORAGE",  "id":22271,   "ctx":"initandlisten","msg":"Detected unclean shutdown - Lock file is not empty","attr":{"lockFile":"/data/db/mongod.lock"}}
      {"t":{"$date":"2021-05-26T04:16:04.226+00:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/db","storageEngine":"wiredTiger"}}
      {"t":{"$date":"2021-05-26T04:16:04.226+00:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
      {"t":{"$date":"2021-05-26T04:16:04.231+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=15546M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}}
      {"t":{"$date":"2021-05-26T04:16:05.436+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002565:436269][1:0x7f8bec8e1ac0], connection: __posix_std_fallocate, 58: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
      {"t":{"$date":"2021-05-26T04:16:05.436+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002565:436722][1:0x7f8bec8e1ac0], connection: __posix_sys_fallocate, 75: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
      {"t":{"$date":"2021-05-26T04:16:09.091+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:91573][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 1174"}}
      {"t":{"$date":"2021-05-26T04:16:09.229+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:229763][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1174 through 1174"}}
      {"t":{"$date":"2021-05-26T04:16:09.320+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:320929][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 1173/103566208 to 1174/256"}}
      {"t":{"$date":"2021-05-26T04:16:09.346+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:346568][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 1174"}}
      {"t":{"$date":"2021-05-26T04:16:15.091+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1622002575:91431][1:0x7f8bec8e1ac0], file:collection-14--1765169284971021277.wt, txn-recover: __posix_file_read, 431: /data/db/collection-14--1765169284971021277.wt: handle-read: pread: failed to read 4096 bytes at offset 1034993664: WT_ERROR: non-specific WiredTiger error"}}
      {"t":{"$date":"2021-05-26T04:16:15.091+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1622002575:91629][1:0x7f8bec8e1ac0], file:collection-14--1765169284971021277.wt, txn-recover: __txn_op_apply, 288: operation apply failed during recovery: operation type 4 at LSN 1173/103571968: WT_ERROR: non-specific WiredTiger error"}}
      {"t":{"$date":"2021-05-26T04:16:15.091+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1622002575:91710][1:0x7f8bec8e1ac0], file:collection-14--1765169284971021277.wt, txn-recover: __wt_txn_recover, 997: Recovery failed: WT_ERROR: non-specific WiredTiger error"}}
      {"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97064][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 364: cache server: exiting with 87 pages in memory and 0 pages evicted"}}
      {"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97125][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 367: cache server: exiting with 874637 image bytes in memory"}}
      {"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97155][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 370: cache server: exiting with 2726005 bytes in memory"}}
      {"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97203][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 376: cache server: exiting with 1583673 bytes dirty and 40 pages dirty"}}
      {"t":{"$date":"2021-05-26T04:16:15.153+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002575:153583][1:0x7f8bec8e1ac0], connection: __posix_std_fallocate, 58: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
      {"t":{"$date":"2021-05-26T04:16:15.153+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002575:153709][1:0x7f8bec8e1ac0], connection: __posix_sys_fallocate, 75: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
      {"t":{"$date":"2021-05-26T04:16:18.633+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002578:633341][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 1175"}}

       

      In the journal directory these files are continuously generated and is up to 2036 now:

      WiredTigerLog.0000002036

       

      Is this database hosed? or is there any way to recover from this?

        Attachments

          Activity

            People

            Assignee:
            dmitry.agranat Dmitry Agranat
            Reporter:
            bh3@digitalblur.com t b
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: