[SERVER-40731] Cannot start MongoDB with existing data Created: 19/Apr/19  Updated: 24/May/19  Resolved: 24/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.9
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Abed Halawi Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

Full error from logs:

 

019-04-19T11:33:39.035+0000 E STORAGE [initandlisten] WiredTiger error (-31803) [1555673619:35254][1:0x7f199152ba40], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found Raw: [1555673619:35254][1:0x7f199152ba40], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found

 

Specifically focusing on:

colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found Raw: [1555673619:35254][1:0x7f199152ba40]

As far as i got to in my research, this is due to a corruption in files WiredTiger.turtle and WiredTiger.wt. I've seen in previous resolved issues that a repair attempts were handed in the form of files.

{{}}

If it is a corruption issue, what are the steps to repair?

 



 Comments   
Comment by Danny Hatcher (Inactive) [ 02/May/19 ]

When a repair finishes, the node shuts down automatically. This is normal and expected.

Comment by Abed Halawi [ 01/May/19 ]

In order to be able to provide what you asked for, please explain what you mean by "The node shutting down automatically while running repair". I will reproduce and send you the logs a.s.a.p.

Comment by Danny Hatcher (Inactive) [ 01/May/19 ]

As I have mentioned before in this ticket, we do not believe the error message is a cause for concern. It looks like the repair completed cleanly. If you would like us to continue investigating the issue, please provide the logs I asked for in my previous comment.

Comment by Abed Halawi [ 30/Apr/19 ]

As you can see from the logs above, the repair started at  2019-04-29T07:16:24 but after 1h 32min the error occurred at 2019-04-29T08:48:33.

Comment by Danny Hatcher (Inactive) [ 30/Apr/19 ]

To be clear, after that repair at 04-29T08:49 you started the server and it errored again? I'm a little confused about the timeline; could you please upload your mongod log file to this ticket covering the timeframe of:

  • Starting the node with repair
  • The node shutting down automatically while running repair
  • Starting the node without repair
  • It shutting down automatically without you initiating the shutdown
Comment by Abed Halawi [ 29/Apr/19 ]

Apparently the repair hasn't completed properly and now i'm getting this:

 

 

2019-04-29T07:16:24.441+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongodb-f67f5c86f-rs8px
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] db version v4.0.9
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] git version: fc525e2d9b0e4bceff5c2201457e564362909765
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] modules: none
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] build environment:
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] distmod: ubuntu1604
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] distarch: x86_64
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] 3072 MB of memory available to the process out of 7478 MB total system memory
2019-04-29T07:16:24.446+0000 I CONTROL [initandlisten] options: { net: { bindIpAll: true }, storage: { wiredTiger: { engineConfig: { cacheSizeGB: 1.0 } } } }
2019-04-29T07:16:24.446+0000 W STORAGE [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2019-04-29T07:16:24.446+0000 F STORAGE [initandlisten] An incomplete repair has been detected! This is likely because a repair operation unexpectedly failed before completing. MongoDB will not start up again without --repair.
2019-04-29T07:16:24.446+0000 F - [initandlisten] Fatal Assertion 50922 at src/mongo/db/storage/storage_engine_init.cpp 86
2019-04-29T07:16:24.446+0000 F - [initandlisten]
***aborting after fassert() failure

 

Running another repair now and will update this once done.

 

Update

 

Repair broke again, here's a snippet of the logs:

 

2019-04-29T08:46:41.720+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:41.726+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ff7ba29e-8d5a-44f5-928a-65d775113cad-profile
2019-04-29T08:46:41.780+0000 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-29T08:46:41.780+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-1052--6330009858057145811. Not salvaging.
2019-04-29T08:46:41.793+0000 I INDEX    [initandlisten] build index on: scores.talents-ff7ba29e-8d5a-44f5-928a-65d775113cad-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ff7ba29e-8d5a-44f5-928a-65d775113cad-profile" }
2019-04-29T08:46:41.793+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:41.806+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ff919896-1d72-4265-a7e0-1d52a5cb9092-profile
2019-04-29T08:46:41.875+0000 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-29T08:46:41.878+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-426-7666083095003676132. Not salvaging.
2019-04-29T08:46:41.891+0000 I INDEX    [initandlisten] build index on: scores.talents-ff919896-1d72-4265-a7e0-1d52a5cb9092-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ff919896-1d72-4265-a7e0-1d52a5cb9092-profile" }
2019-04-29T08:46:41.891+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:41.906+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ff92f0f6-a736-434b-aa98-a376feeb8572-profile
2019-04-29T08:46:41.991+0000 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-29T08:46:42.012+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-1054--6330009858057145811. Not salvaging.
2019-04-29T08:46:42.043+0000 I INDEX    [initandlisten] build index on: scores.talents-ff92f0f6-a736-434b-aa98-a376feeb8572-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ff92f0f6-a736-434b-aa98-a376feeb8572-profile" }
2019-04-29T08:46:42.043+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:42.061+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ff94f582-3ffa-43b6-b526-40d5a9d6c00c-profile
2019-04-29T08:46:42.107+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-402--652419432190789118. Not salvaging.
2019-04-29T08:46:42.119+0000 I INDEX    [initandlisten] build index on: scores.talents-ff94f582-3ffa-43b6-b526-40d5a9d6c00c-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ff94f582-3ffa-43b6-b526-40d5a9d6c00c-profile" }
2019-04-29T08:46:42.119+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:42.132+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ff9e5033-3925-4dbc-ad3c-c837efafbd0f-profile
2019-04-29T08:46:42.193+0000 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-29T08:46:42.193+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-472-7666083095003676132. Not salvaging.
2019-04-29T08:46:42.202+0000 I INDEX    [initandlisten] build index on: scores.talents-ff9e5033-3925-4dbc-ad3c-c837efafbd0f-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ff9e5033-3925-4dbc-ad3c-c837efafbd0f-profile" }
2019-04-29T08:46:42.202+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:42.215+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ffb330bb-3e77-49c7-9247-abb41dc8c92a-profile
2019-04-29T08:46:42.300+0000 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-29T08:46:42.312+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-1056--6330009858057145811. Not salvaging.
2019-04-29T08:46:42.322+0000 I INDEX    [initandlisten] build index on: scores.talents-ffb330bb-3e77-49c7-9247-abb41dc8c92a-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ffb330bb-3e77-49c7-9247-abb41dc8c92a-profile" }
2019-04-29T08:46:42.322+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:42.337+0000 I STORAGE  [initandlisten] Repairing collection scores.talents-ffddf023-fe29-4c25-b512-3e59e1d1608e-profile
2019-04-29T08:46:42.344+0000 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-230-8208706207284907894. Not salvaging.
2019-04-29T08:46:42.355+0000 I INDEX    [initandlisten] build index on: scores.talents-ffddf023-fe29-4c25-b512-3e59e1d1608e-profile properties: { v: 2, key: { _id: 1 }, name: "_id_", ns: "scores.talents-ffddf023-fe29-4c25-b512-3e59e1d1608e-profile" }
2019-04-29T08:46:42.355+0000 I INDEX    [initandlisten]      building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-04-29T08:46:48.261+0000 W ASIO     [initandlisten] No TransportLayer configured during NetworkInterface startup
2019-04-29T08:46:50.466+0000 I STORAGE  [initandlisten] finished checking dbs
2019-04-29T08:46:50.556+0000 I STORAGE  [initandlisten] WiredTigerKVEngine shutting down
2019-04-29T08:46:50.566+0000 I STORAGE  [initandlisten] Shutting down session sweeper thread
2019-04-29T08:46:50.566+0000 I STORAGE  [initandlisten] Finished shutting down session sweeper thread
2019-04-29T08:46:53.218+0000 I STORAGE  [initandlisten] Downgrading WiredTiger datafiles.
2019-04-29T08:47:33.304+0000 I STORAGE  [initandlisten] WiredTiger message [1556527653:304562][1:0x7f08f397da80], txn-recover: Main recovery loop: starting at 2089/75174272 to 2090/256
2019-04-29T08:47:33.495+0000 I STORAGE  [initandlisten] WiredTiger message [1556527653:495403][1:0x7f08f397da80], txn-recover: Recovering log 2089 through 2090
2019-04-29T08:47:33.558+0000 I STORAGE  [initandlisten] WiredTiger message [1556527653:558928][1:0x7f08f397da80], txn-recover: Recovering log 2090 through 2090
2019-04-29T08:47:33.797+0000 I STORAGE  [initandlisten] WiredTiger message [1556527653:797668][1:0x7f08f397da80], txn-recover: Set global recovery timestamp: 0
2019-04-29T08:48:33.943+0000 E STORAGE  [initandlisten] WiredTiger error (-31803) [1556527713:943425][1:0x7f08f397da80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found Raw: [1556527713:943425][1:0x7f08f397da80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found
2019-04-29T08:49:23.549+0000 I STORAGE  [initandlisten] shutdown: removing fs lock...
2019-04-29T08:49:23.549+0000 I CONTROL  [initandlisten] now exiting
2019-04-29T08:49:23.549+0000 I CONTROL  [initandlisten] shutting down with code:0

Comment by Danny Hatcher (Inactive) [ 25/Apr/19 ]

Hello Abed,

The WiredTiger error message occurred 48 minutes before the shutdown. As mentioned previously, it should be a misidentified internal message and not relevant to any problems that you are having. It is possible that the repair run completed. If you start up the service without the --repair option now, what does it say?

Thanks,

Danny

Comment by Abed Halawi [ 25/Apr/19 ]

Still the same, here:

2019-04-25T09:37:20.875+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongodb-6467fdf6d4-f8c6s
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] db version v4.0.9
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] git version: fc525e2d9b0e4bceff5c2201457e564362909765
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] modules: none
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] build environment:
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] distmod: ubuntu1604
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] distarch: x86_64
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] 3072 MB of memory available to the process out of 13022 MB total system memory
2019-04-25T09:37:20.881+0000 I CONTROL [initandlisten] options: { net: { bindIpAll: true }, repair: true, storage: { wiredTiger: { engineConfig: { cacheSizeGB: 1.0 } } } }
2019-04-25T09:37:20.882+0000 W STORAGE [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2019-04-25T09:37:20.966+0000 I STORAGE [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-04-25T09:37:20.966+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2019-04-25T09:37:20.966+0000 I STORAGE [initandlisten]
2019-04-25T09:37:20.966+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2019-04-25T09:37:20.966+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2019-04-25T09:37:20.966+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1024M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2019-04-25T09:37:29.865+0000 I STORAGE [initandlisten] WiredTiger message [1556185049:865020][1:0x7f6d89fcfa80], txn-recover: Main recovery loop: starting at 1579/10625920 to 1580/256
2019-04-25T09:37:29.874+0000 I STORAGE [initandlisten] WiredTiger message [1556185049:874458][1:0x7f6d89fcfa80], txn-recover: Recovering log 1579 through 1580
2019-04-25T09:37:30.200+0000 I STORAGE [initandlisten] WiredTiger message [1556185050:200613][1:0x7f6d89fcfa80], file:sizeStorer.wt, txn-recover: Recovering log 1580 through 1580
2019-04-25T09:37:30.332+0000 I STORAGE [initandlisten] WiredTiger message [1556185050:332663][1:0x7f6d89fcfa80], file:sizeStorer.wt, txn-recover: Set global recovery timestamp: 0
2019-04-25T09:37:30.421+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2019-04-25T09:37:30.427+0000 I STORAGE [initandlisten] Repairing size cache
2019-04-25T09:37:30.534+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-25T09:37:30.575+0000 I STORAGE [initandlisten] Verify succeeded on uri table:sizeStorer. Not salvaging.
2019-04-25T09:37:30.586+0000 I STORAGE [initandlisten] Repairing catalog metadata
2019-04-25T09:37:30.622+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 100
2019-04-25T09:37:30.636+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 200
2019-04-25T09:37:30.656+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 300
2019-04-25T09:37:30.677+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 400
2019-04-25T09:37:30.700+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 500
2019-04-25T09:37:30.722+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 600
2019-04-25T09:37:30.741+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 700
2019-04-25T09:37:30.761+0000 I STORAGE [initandlisten] WiredTiger progress WT_SESSION.verify 800
2019-04-25T09:37:30.780+0000 I STORAGE [initandlisten] Verify succeeded on uri table:_mdb_catalog. Not salvaging.
2019-04-25T09:38:27.219+0000 E STORAGE [initandlisten] WiredTiger error (-31803) [1556185107:219367][1:0x7f6d89fcfa80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found Raw: [1556185107:219367][1:0x7f6d89fcfa80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found

 After that the instance hangs and no progress is made.

 

Connecting with "mongo" command returns the following: 

MongoDB shell version v4.0.9
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
2019-04-25T09:40:27.787+0000 E QUERY [js] Error: couldn't connect to server 127.0.0.1:27017, connection attempt failed: SocketException: Error connecting to 127.0.0.1:27017 :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:343:13
@(connect):2:6
exception: connect failed
command terminated with exit code 1

 

EDIT: After running for a while and trying to recover several collections (thought it was hanging), this error occurred (in reverse order, due to reading the logs from a different place):

2019-04-25T11:18:15.729+0000 I CONTROL [initandlisten] now exiting
2019-04-25T11:18:15.728+0000 I STORAGE [initandlisten] shutdown: removing fs lock...
2019-04-25T11:17:23.836+0000 E STORAGE [initandlisten] WiredTiger error (-31803) [1556191043:836689][1:0x7f6952952a80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found Raw: [1556191043:836689][1:0x7f6952952a80], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found
2019-04-25T11:16:18.470+0000 I STORAGE [initandlisten] WiredTiger message [1556190978:470023][1:0x7f6952952a80], txn-recover: Set global recovery timestamp: 0
2019-04-25T11:16:16.323+0000 I STORAGE [initandlisten] WiredTiger message [1556190976:323966][1:0x7f6952952a80], txn-recover: Recovering log 1586 through 1586
2019-04-25T11:16:15.938+0000 I STORAGE [initandlisten] WiredTiger message [1556190975:937670][1:0x7f6952952a80], txn-recover: Recovering log 1585 through 1586
2019-04-25T11:16:10.433+0000 I STORAGE [initandlisten] WiredTiger message [1556190970:433883][1:0x7f6952952a80], txn-recover: Main recovery loop: starting at 1585/76067456 to 1586/256
2019-04-25T11:15:28.486+0000 I STORAGE [initandlisten] Downgrading WiredTiger datafiles.
2019-04-25T11:15:25.381+0000 I STORAGE [initandlisten] Finished shutting down session sweeper thread
2019-04-25T11:15:25.381+0000 I STORAGE [initandlisten] Shutting down session sweeper thread
2019-04-25T11:15:25.376+0000 I STORAGE [initandlisten] WiredTigerKVEngine shutting down
2019-04-25T11:15:25.080+0000 I STORAGE [initandlisten] finished checking dbs
2019-04-25T11:15:22.985+0000 W ASIO [initandlisten] No TransportLayer configured during NetworkInterface startup

Comment by Danny Hatcher (Inactive) [ 24/Apr/19 ]

Hello Abed,

Could you please try starting the node up on 4.0.9 with the --repair option and provide the full logs to this ticket?

Danny

Comment by Abed Halawi [ 23/Apr/19 ]

Yes the instance immediately shutdown after that error is displayed, there are no further logs. Even connecting to the instance returns a generic "Connection refused" error, which doesn't allow us to run any queries.

 

Tried both versions 4.0.5 and 4.0.9 and both behaved the same.

Comment by Danny Hatcher (Inactive) [ 22/Apr/19 ]

Hello Abed,

Thank you for providing that information. When WiredTiger finds a partial metadata set, it prints that informational message, skips that table and keeps going. So we are letting the cursor continue its cursor walk and complete. Consequently, these log lines do not indicate an issue that would prevent startup as MongoDB. Does the instance immediately shutdown after the error you provided above or does it still run? If it still runs, can you connect to it and run queries?

If it works with the 4.0.5 binary but not 4.0.9, could you please try to start the node up again with 4.0.9 and provide the result?

Thanks,

Danny

Comment by Abed Halawi [ 20/Apr/19 ]
  1. Here are the logs when turning on the instance:

2019-04-20T08:47:24.714+0000 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
2019-04-20T08:47:24.785+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongodb-d7d8d8879-582xc
2019-04-20T08:47:24.785+0000 I CONTROL [initandlisten] db version v4.0.5
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] git version: 3739429dd92b92d1b0ab120911a23d50bf03c412
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] modules: none
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] build environment:
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] distmod: ubuntu1604
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] distarch: x86_64
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-04-20T08:47:24.786+0000 I CONTROL [initandlisten] options: { net: { bindIpAll: true }, storage: { wiredTiger: { engineConfig: { cacheSizeGB: 1.0 } } } }
2019-04-20T08:47:24.795+0000 W STORAGE [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2019-04-20T08:47:24.796+0000 I STORAGE [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-04-20T08:47:24.796+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2019-04-20T08:47:24.796+0000 I STORAGE [initandlisten]
2019-04-20T08:47:24.796+0000 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2019-04-20T08:47:24.796+0000 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2019-04-20T08:47:24.797+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1024M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2019-04-20T08:47:29.806+0000 I STORAGE [initandlisten] WiredTiger message [1555750049:806487][1:0x7f844f365a40], txn-recover: Main recovery loop: starting at 685/6046976 to 686/256
2019-04-20T08:47:29.808+0000 I STORAGE [initandlisten] WiredTiger message [1555750049:808695][1:0x7f844f365a40], txn-recover: Recovering log 685 through 686
2019-04-20T08:47:29.985+0000 I STORAGE [initandlisten] WiredTiger message [1555750049:985281][1:0x7f844f365a40], file:index-3--7860435134447036396.wt, txn-recover: Recovering log 686 through 686
2019-04-20T08:47:30.090+0000 I STORAGE [initandlisten] WiredTiger message [1555750050:90179][1:0x7f844f365a40], file:index-3--7860435134447036396.wt, txn-recover: Set global recovery timestamp: 0
2019-04-20T08:47:30.117+0000 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2019-04-20T08:48:12.891+0000 E STORAGE [initandlisten] WiredTiger error (-31803) [1555750092:891741][1:0x7f844f365a40], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found Raw: [1555750092:891741][1:0x7f844f365a40], file:WiredTiger.wt, WT_CURSOR.next: __schema_create_collapse, 111: metadata information for source configuration "colgroup:collection-180--1731483854887362592" not found: WT_NOTFOUND: item not found

  1. The instance is running within a Docker container in a Kubernetes environment running on Google Cloud. The persistent layer is a network-attached SSD, specifically GCP persistent disk attached to the instance where Mongo is running. File system is ext4
  2. None**
  3. Deployment history
    1. Running version 4.0.4 since January
      Updated to 4.0.9 in April
      Downgraded (current) to 4.0.5 in April (after the incident)
      > note: would rather run the latest 4 (4.0.9)
    2. None
    3. Not available
    4. Never
  4. Yes. The underlying files were never [manually] accessed nor manipulated. Disk checks are all green
    > note: I mentioned "manually" because I cannot guarantee what our provider (GCP) might have done with the disks, thought highly doubtful they would move or change in that sense.

 

Please let me know if these details are sufficient and if anything else is required.

 

Thank you Danny.

Comment by Danny Hatcher (Inactive) [ 19/Apr/19 ]

Hello Abed,

We will need further information on the circumstances in order to help repair the issue. Please provide the following:

  1. The complete logs for the affected node, including before, leading up to, and after the first sign of corruption.
  2. A complete description of the underlying storage mechanism in use. Please address questions such as whether storage is locally attached or network-attached, whether disks are SSDs or HDDs, whether disks are SSDs or HDDs, whether RAID is in use and if so how it is configured, and what file system and/or volume management system is in use.
  3. A description of your backup method, if any.
  4. A history of the deployment, including:
    1. a timeline of version changes
    2. a timeline of hardware upgrade/downgrade cycles or configuration changes
    3. a timeline of disaster recovery or backup restoration activities
    4. a timeline of any manipulations of the underlying database files, including copies or moves, and information about whether mongod was running during each manipulation.
  5. Finally, can you provide assurances that you have not manipulated (copied or moved) the underlying database files while mongod was running, and that your disks have been recently checked for integrity?

Thank you,

Danny

Generated at Thu Feb 08 04:55:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.