[SERVER-40904] Mongo wont restart following disk full Created: 30/Apr/19  Updated: 30/Jul/19  Resolved: 30/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Leigh Jones Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

We are running Mongo 4.0.0 using a single node setup with a replica set enabled.

Our data disk filled up and the mongo process stopped. We are now unable to connect to the DB and the process continually takes about between 500% and 800% CPU (We have 8 CPUs available).

I've increased the verbosity of the system log (to 5) as nothing is emitted otherwise.

Apologies but I can't share the log files so I have tried to provide a summary below:

We see lots of STORAGE messages for a while with out data records then printed out under with the text "fetched CCE metadata". This runs for a while.

We then see Debug messages from RECOVERY where it runs through "Applying op x of 5000") through to 5000 of 5000.

We then see lots of WT set timestamp of future write operations messages followed by WT begin transaction for snapshot id X and then commit_transaction for snapshot X.

Eventually these logs message finish to be replaced by:

WTJournalFlusher flushed Journal every 100ms

and:

STORAGE [ftdc] setting timestamp read source: 1, provided timestamp: none

  • [ftdc] User Assertion: NotYetInitialized: no replset config has been received src/mongo/db/repl/repl_set_get_status_cmd.cpp 54

STROAGE  [ftdc]NamesplaceUUIDCache: registered namespace local.oplog.rs with UUID uuid

(which happen every second)

 

Please let me know if you need any more information but our database instance is currently unable to start.

 

Thanks in advance.

 

 

 



 Comments   
Comment by Leigh Jones [ 30/Jul/19 ]

Sorry, I didn’t see the previous comment. If it happens again I’ll try and send the requested info

Comment by Danny Hatcher (Inactive) [ 30/Jul/19 ]

Closing due to lack of response.

Comment by Kelsey Schubert [ 03/May/19 ]

Hi leigh.jones@ripjar.com,

Sorry for the delay getting back to you. Unfortunately, it's very challenging to diagnose an issue like this without the complete logs and diagnostic data. Would it be feasible to upload the mongod log and an archive of the diagnostic.data directory in your dbpath to this secure upload portal that is only visible to MongoDB employees?

Thanks,
Kelsey

Comment by Leigh Jones [ 30/Apr/19 ]

I forgot to say we have 32G RAM and are running on CENTOS 6

Generated at Thu Feb 08 04:56:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.