[SERVER-57515] DB recovery failing Created: 08/Jun/21  Updated: 05/Jul/21  Resolved: 05/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.0
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: t b Assignee: Dmitry Agranat
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Our database crashed and recovery is not working. The recovery keeps cycling through these iterations and continuously generates files in the journal directory. These journal files dont exist before the crash date.

 

Here's the relevant part of the log when the crash and recovery started:

{"t":{"$date":"2021-05-26T04:15:58.478+00:00"},"s":"I",  "c":"CONTROL",  "id":20698,   "ctx":"main","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2021-05-26T04:15:59.111+00:00"},"s":"I",  "c":"CONTROL",  "id":23285,   "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"}
{"t":{"$date":"2021-05-26T04:16:01.723+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-05-26T04:16:01.723+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2021-05-26T04:16:01.748+00:00"},"s":"I",  "c":"STORAGE",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"36bd19ba086f"}}
{"t":{"$date":"2021-05-26T04:16:04.110+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.3","gitVersion":"913d6b62acfbb344dde1b116f4161360acd8fd13","openSSLVersion":"OpenSSL 1.1.1  11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2021-05-26T04:16:04.110+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}}
{"t":{"$date":"2021-05-26T04:16:04.110+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/etc/mongod.conf","net":{"bindIp":"*","port":27017},"operationProfiling":{"slowOpThresholdMs":500},"processManagement":{"timeZoneInfo":"/usr/share/zoneinfo"},"storage":{"dbPath":"/data/db","journal":{"enabled":true}},"systemLog":{"destination":"file","logAppend":true,"path":"/var/log/mongod.log"}}}}
{"t":{"$date":"2021-05-26T04:16:04.206+00:00"},"s":"W",  "c":"STORAGE",  "id":22271,   "ctx":"initandlisten","msg":"Detected unclean shutdown - Lock file is not empty","attr":{"lockFile":"/data/db/mongod.lock"}}
{"t":{"$date":"2021-05-26T04:16:04.226+00:00"},"s":"I",  "c":"STORAGE",  "id":22270,   "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/db","storageEngine":"wiredTiger"}}
{"t":{"$date":"2021-05-26T04:16:04.226+00:00"},"s":"W",  "c":"STORAGE",  "id":22302,   "ctx":"initandlisten","msg":"Recovering data from the last clean checkpoint."}
{"t":{"$date":"2021-05-26T04:16:04.231+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=15546M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}}
{"t":{"$date":"2021-05-26T04:16:05.436+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002565:436269][1:0x7f8bec8e1ac0], connection: __posix_std_fallocate, 58: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
{"t":{"$date":"2021-05-26T04:16:05.436+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002565:436722][1:0x7f8bec8e1ac0], connection: __posix_sys_fallocate, 75: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
{"t":{"$date":"2021-05-26T04:16:09.091+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:91573][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 1174"}}
{"t":{"$date":"2021-05-26T04:16:09.229+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:229763][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1174 through 1174"}}
{"t":{"$date":"2021-05-26T04:16:09.320+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:320929][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 1173/103566208 to 1174/256"}}
{"t":{"$date":"2021-05-26T04:16:09.346+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002569:346568][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 1174"}}
{"t":{"$date":"2021-05-26T04:16:15.091+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1622002575:91431][1:0x7f8bec8e1ac0], file:collection-14--1765169284971021277.wt, txn-recover: __posix_file_read, 431: /data/db/collection-14--1765169284971021277.wt: handle-read: pread: failed to read 4096 bytes at offset 1034993664: WT_ERROR: non-specific WiredTiger error"}}
{"t":{"$date":"2021-05-26T04:16:15.091+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1622002575:91629][1:0x7f8bec8e1ac0], file:collection-14--1765169284971021277.wt, txn-recover: __txn_op_apply, 288: operation apply failed during recovery: operation type 4 at LSN 1173/103571968: WT_ERROR: non-specific WiredTiger error"}}
{"t":{"$date":"2021-05-26T04:16:15.091+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1622002575:91710][1:0x7f8bec8e1ac0], file:collection-14--1765169284971021277.wt, txn-recover: __wt_txn_recover, 997: Recovery failed: WT_ERROR: non-specific WiredTiger error"}}
{"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97064][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 364: cache server: exiting with 87 pages in memory and 0 pages evicted"}}
{"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97125][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 367: cache server: exiting with 874637 image bytes in memory"}}
{"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97155][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 370: cache server: exiting with 2726005 bytes in memory"}}
{"t":{"$date":"2021-05-26T04:16:15.097+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1622002575:97203][1:0x7f8bec8e1ac0], connection: __wt_cache_destroy, 376: cache server: exiting with 1583673 bytes dirty and 40 pages dirty"}}
{"t":{"$date":"2021-05-26T04:16:15.153+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002575:153583][1:0x7f8bec8e1ac0], connection: __posix_std_fallocate, 58: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
{"t":{"$date":"2021-05-26T04:16:15.153+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002575:153709][1:0x7f8bec8e1ac0], connection: __posix_sys_fallocate, 75: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
{"t":{"$date":"2021-05-26T04:16:18.633+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1622002578:633341][1:0x7f8bec8e1ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 1175"}}

 

In the journal directory these files are continuously generated and is up to 2036 now:

WiredTigerLog.0000002036

 

Is this database hosed? or is there any way to recover from this?



 Comments   
Comment by Dmitry Agranat [ 05/Jul/21 ]

Glad to hear the same process worked out for you bh3@digitalblur.com. I will go ahead and resolve this case.

Regards,
Dima

Comment by t b [ 04/Jul/21 ]

Hi Dmitry,

I was able to start up the database by removing the journal directory the same as I had provided to you. 

Thanks,

Tom

Comment by Dmitry Agranat [ 28/Jun/21 ]

Hi bh3@digitalblur.com, it looks like the image you tried to attach in your last comment didn't make it.

I was able to start up a healthy mongod instance with the data you have provided, without any issues. Just to make sure, I executed validate on all collections. The only difference I had, as compared to your deployment, I used ubuntu 18.04 w/o docker-compose. I recommend trying to start up your mongod instance on plain ubuntu 18.04 w/o the docker-compose and if everything works fine, trying to work with the team that maintains these docker-compose images to figure out what might be the issue.

Dima

Comment by t b [ 24/Jun/21 ]

Hi Dmitry,

For the file upload, correct, I selected everything in the db path directory except for the journal files.

I am using the docker version from the docker repository here:
image: "mongo:4.4.6"

 

Thanks,

Tom

Comment by Dmitry Agranat [ 24/Jun/21 ]

Thanks bh3@digitalblur.com, I just want to clarify regarding the uploaded data. This is a complete set of all files (97 GB) excluding the journal files (85 GB), a total of ~12 GB. Is this correct?

Also, I see that the "modules" section is empty:

"modules": []

Could you point me to the link where you have downloaded the binaries for this build?

Comment by t b [ 22/Jun/21 ]

Hi Dmitry, I uploaded the zipped db directory without the journal files

Comment by Dmitry Agranat [ 16/Jun/21 ]

Hi bh3@digitalblur.com, would it be possible for you to upload a copy of the dbpath for us so we could investigate this further?

Comment by t b [ 15/Jun/21 ]

Hi Dmitry,

I am using docker version 4.4.6:

 

root@2d299d79c52c:/# mongod -version
db version v4.4.6
Build Info: {
    "version": "4.4.6",
    "gitVersion": "72e66213c2c3eab37d9358d5e78ad7f5c1d0d0d7",
    "openSSLVersion": "OpenSSL 1.1.1  11 Sep 2018",
    "modules": [],
    "allocator": "tcmalloc",
    "environment": {
        "distmod": "ubuntu1804",
        "distarch": "x86_64",
        "target_arch": "x86_64"
    }
}

 

 

Size of db directory:

du -h db
 85G    db/journal
205M    db/diagnostic.data
 97G    db
 

Comment by Dmitry Agranat [ 14/Jun/21 ]

Hi bh3@digitalblur.com, a couple of clarifying questions:

  • What is the exact binary you used for MongoDB?
  • What is the size of the /dbpath ?
Comment by t b [ 14/Jun/21 ]

Hi Dmitry,

I added the files requested to the same link above. I dont know where the messages log is though. This computer is a mac.

The original server where the database was running is using docker and is set to always restart so its always in a repair loop. The instance was stopped by running docker-compose down.

I then connected the server hard drive to my mac os x computer and copied the db files from the external hard drive to my mac os x hard drive and brought the mongo db server up using docker-compose up and the repair process automatically started.

Comment by Dmitry Agranat [ 13/Jun/21 ]

Thanks for the update bh3@digitalblur.com,

Could you update the same set of logs from the new computer as you did in this comment as well as the archive of the diagnostic.data?

Also, could you clarify how did you copy the data to a different computer? Specifically, what was the state of the mongod process on the source, was it still in the repair loop?

Comment by t b [ 12/Jun/21 ]

Hi Dimitry,

I've copied the database files to a different computer and different disk and Im seeing the same activity Where the repair process keeps looping with the same error:

 

{"t":{"$date":"2021-06-12T06:38:57.237+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623479937:237594][1:0x7f978bea4ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 2043"}}
{"t":{"$date":"2021-06-12T06:38:57.395+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1623479937:394433][1:0x7f978bea4ac0], file:collection-14--1765169284971021277.wt, txn-recover: __posix_file_read, 431: /data/db/collection-14--1765169284971021277.wt: handle-read: pread: failed to read 4096 bytes at offset 1034993664: WT_ERROR: non-specific WiredTiger error"}}
{"t":{"$date":"2021-06-12T06:38:57.398+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1623479937:398876][1:0x7f978bea4ac0], file:collection-14--1765169284971021277.wt, txn-recover: __txn_op_apply, 288: operation apply failed during recovery: operation type 4 at LSN 1173/103571968: WT_ERROR: non-specific WiredTiger error"}}
{"t":{"$date":"2021-06-12T06:38:57.400+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1623479937:400329][1:0x7f978bea4ac0], file:collection-14--1765169284971021277.wt, txn-recover: __wt_txn_recover, 1045: Recovery failed: WT_ERROR: non-specific WiredTiger error"}}
{"t":{"$date":"2021-06-12T06:38:57.416+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623479937:416571][1:0x7f978bea4ac0], connection: __wt_cache_destroy, 364: cache server: exiting with 87 pages in memory and 0 pages evicted"}}
{"t":{"$date":"2021-06-12T06:38:57.417+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623479937:417023][1:0x7f978bea4ac0], connection: __wt_cache_destroy, 367: cache server: exiting with 874637 image bytes in memory"}}
{"t":{"$date":"2021-06-12T06:38:57.417+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623479937:417331][1:0x7f978bea4ac0], connection: __wt_cache_destroy, 370: cache server: exiting with 2726005 bytes in memory"}}
{"t":{"$date":"2021-06-12T06:38:57.417+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623479937:417803][1:0x7f978bea4ac0], connection: __wt_cache_destroy, 376: cache server: exiting with 1583673 bytes dirty and 40 pages dirty"}}
{"t":{"$date":"2021-06-12T06:38:57.802+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623479937:802865][1:0x7f978bea4ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 2044"}}

Comment by Dmitry Agranat [ 11/Jun/21 ]

Hi bh3@digitalblur.com,

You are correct, there are indeed repeated kernel errors inside kern.log indicating some low-level sdb disk corruption at different sectors:

Jun  8 17:22:12 ubsrv kernel: [132634.182778] blk_update_request: I/O error, dev sdb, sector 7210859016 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
Jun  8 17:23:02 ubsrv kernel: [132684.501494] blk_update_request: I/O error, dev sdb, sector 7212912136 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
Jun  8 17:23:41 ubsrv kernel: [132723.851729] blk_update_request: I/O error, dev sdb, sector 7215994888 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
Jun  8 17:23:48 ubsrv kernel: [132730.639364] blk_update_request: I/O error, dev sdb, sector 7216523784 op 0x0:(READ) flags 0x80700 phys_seg 64 prio class 0
...
...
...

The syslog also prints partial map device corruption:

CDB: Read(16) 88 00 00 00 00 01 ad d3 22 08 00 00 02 00 00 00

Please let us know if after fixing the disk corruption issue you still experience the reported issue.

Thanks,
Dima

Comment by t b [ 11/Jun/21 ]

Hi Dmitry,

This is a standalone instance.

I did notice some disk read errors in the kernel log.

Im not sure where the messages log is, can you point me in the right direction?

I've uploaded these files:

dmesg
kern.log
mongod.log
syslog
WiredTiger.turtle
WiredTiger.wt

Thanks,
Tom

Comment by Dmitry Agranat [ 09/Jun/21 ]

bh3@digitalblur.com, for completeness, please upload the full mongod log covering the last failed statup, wiredTiger.wt and wiredTiger.turtle files, messages and dmesg log.

I've created a secure portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by Dmitry Agranat [ 09/Jun/21 ]

Hi bh3@digitalblur.com

Due to the formatting issue of the original test, I assumed the issue in question was for these messages:

{"t":{"$date":"2021-05-26T04:16:05.436+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002565:436269][1:0x7f8bec8e1ac0], connection: __posix_std_fallocate, 58: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}
{"t":{"$date":"2021-05-26T04:16:05.436+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":95,"message":"[1622002565:436722][1:0x7f8bec8e1ac0], connection: __posix_sys_fallocate, 75: /data/db/journal/WiredTigerTmplog.0000000001: fallocate:: Operation not supported"}}

Which was indeed addressed by WT-7223 and these messages are gone now with MongoDB 4.4.5

I have changed the latest log message to show separate lines rather than one long string of all messages concatenated to better see the sequence of events. The relevant message is:

wt: handle-read: pread: failed to read 4096 bytes at offset 1034993664

While I am looking into this, could you please answer these questions:

  • Is this a Replica Set or a Standalone instance?
  • Do you see any kernel errors inside messages or dmesg logs?

Also, please attach copies of the wiredTiger.wt and wiredTiger.turtle files

Thanks,
Dima

Comment by t b [ 08/Jun/21 ]

Hi Dmitry, thanks for the info.

 

I am still seeing this issue after upgrading to 4.4.5:

Log after starting up:

{"t":{"$date":"2021-06-08T17:22:45.693+00:00"},"s":"I", "c":"CONTROL", "id":23285, "ctx":"main","msg":"Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'"} 
{"t":{"$date":"2021-06-08T17:22:45.713+00:00"},"s":"W", "c":"ASIO", "id":22601, "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"} 
{"t":{"$date":"2021-06-08T17:22:45.714+00:00"},"s":"I", "c":"NETWORK", "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."} 
{"t":{"$date":"2021-06-08T17:22:45.714+00:00"},"s":"I", "c":"STORAGE", "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":1,"port":27017,"dbPath":"/data/db","architecture":"64-bit","host":"d2fbd16512d7"}} 
{"t":{"$date":"2021-06-08T17:22:45.714+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.5","gitVersion":"ff5cb77101b052fa02da43b8538093486cf9b3f7","openSSLVersion":"OpenSSL 1.1.1 11 Sep 2018","modules":[],"allocator":"tcmalloc","environment":{"distmod":"ubuntu1804","distarch":"x86_64","target_arch":"x86_64"}}}} 
{"t":{"$date":"2021-06-08T17:22:45.714+00:00"},"s":"I", "c":"CONTROL", "id":51765, "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"Ubuntu","version":"18.04"}}} 
{"t":{"$date":"2021-06-08T17:22:45.714+00:00"},"s":"I", "c":"CONTROL", "id":21951, "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/etc/mongod.conf","net":{"bindIp":"*","port":27017},"operationProfiling":{"slowOpThresholdMs":500},"processManagement":{"timeZoneInfo":"/usr/share/zoneinfo"},"storage":{"dbPath":"/data/db","journal":{"enabled":true}},"systemLog":{"destination":"file","logAppend":true,"path":"/var/log/mongod.log"}}}} 
{"t":{"$date":"2021-06-08T17:22:45.716+00:00"},"s":"I", "c":"STORAGE", "id":22270, "ctx":"initandlisten","msg":"Storage engine to use detected by data files","attr":{"dbpath":"/data/db","storageEngine":"wiredTiger"}} {"t":{"$date":"2021-06-08T17:22:45.716+00:00"},"s":"I", "c":"STORAGE", "id":22315, "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=15546M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],"}} 
{"t":{"$date":"2021-06-08T17:22:51.615+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623172971:615613][1:0x7f544d166ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 2038"}}

 

And relevant log after going through the recovery cycle:

{"t":{"$date":"2021-06-08T17:53:40.385+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623174820:385105][1:0x7f544d166ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 2038 through 2038"}} {"t":{"$date":"2021-06-08T17:53:46.485+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623174826:478572][1:0x7f544d166ac0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 1173/103566208 to 2038/256"}} 
{"t":{"$date":"2021-06-08T17:53:46.599+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623174826:599005][1:0x7f544d166ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 2038"}} 
{"t":{"$date":"2021-06-08T17:53:53.263+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1623174833:245845][1:0x7f544d166ac0], file:collection-14--1765169284971021277.wt, txn-recover: __posix_file_read, 431: /data/db/collection-14--1765169284971021277.wt: handle-read: pread: failed to read 4096 bytes at offset 1034993664: WT_ERROR: non-specific WiredTiger error"}} 
{"t":{"$date":"2021-06-08T17:53:53.278+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1623174833:278642][1:0x7f544d166ac0], file:collection-14--1765169284971021277.wt, txn-recover: __txn_op_apply, 288: operation apply failed during recovery: operation type 4 at LSN 1173/103571968: WT_ERROR: non-specific WiredTiger error"}} 
{"t":{"$date":"2021-06-08T17:53:53.278+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":-31802,"message":"[1623174833:278744][1:0x7f544d166ac0], file:collection-14--1765169284971021277.wt, txn-recover: __wt_txn_recover, 1045: Recovery failed: WT_ERROR: non-specific WiredTiger error"}} 
{"t":{"$date":"2021-06-08T17:53:53.382+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623174833:382918][1:0x7f544d166ac0], connection: __wt_cache_destroy, 364: cache server: exiting with 87 pages in memory and 0 pages evicted"}} 
{"t":{"$date":"2021-06-08T17:53:53.383+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623174833:383012][1:0x7f544d166ac0], connection: __wt_cache_destroy, 367: cache server: exiting with 874637 image bytes in memory"}} 
{"t":{"$date":"2021-06-08T17:53:53.383+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623174833:383052][1:0x7f544d166ac0], connection: __wt_cache_destroy, 370: cache server: exiting with 2726005 bytes in memory"}} 
{"t":{"$date":"2021-06-08T17:53:53.383+00:00"},"s":"E", "c":"STORAGE", "id":22435, "ctx":"initandlisten","msg":"WiredTiger error","attr":{"error":0,"message":"[1623174833:383084][1:0x7f544d166ac0], connection: __wt_cache_destroy, 376: cache server: exiting with 1583673 bytes dirty and 40 pages dirty"}} 
{"t":{"$date":"2021-06-08T17:53:58.258+00:00"},"s":"I", "c":"STORAGE", "id":22430, "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1623174838:258542][1:0x7f544d166ac0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1173 through 2039"}}

And another journal file has been created:
WiredTigerLog.0000002039

And the recovery starts another cycle. Any other possibilities for recovery?

Thanks

Comment by Dmitry Agranat [ 08/Jun/21 ]

Hi bh3@digitalblur.com, this issue is related to WT-7223 and was fixed in MongoDB 4.4.5. Please let us know if after upgrading to 4.4.5 you still see these messages

Generated at Thu Feb 08 05:42:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.