Details
-
Bug
-
Resolution: Done
-
Major - P3
-
None
-
None
-
None
-
ALL
Description
I have a 13 server mongodb cluster consisting of 1 query router, 3 config servers with replication and 3 shards, each with replication(primary, secondary and arbiter). Its installed on AWS-EC2 R series instances. Monit is used to restart the mongodb service incase it exceeds 95% memory usage.
My Shard3Primary failed and the Shard3Secondary became primary(as expected). The problem is that the Shard3Primary mongodb process isnt able to restart stating
Initializing full-time diagnostic data capture with directory '/data_storage/data/diagnostic.data'
2018-08-03T03:59:12.144+0000 I REPL [initandlisten] Rollback ID is 210
2018-08-03T03:59:12.145+0000 I REPL [initandlisten] Starting recovery oplog application at the appliedThrough: { ts: Timestamp(1533190391, 15335), t: 454 }
2018-08-03T03:59:12.145+0000 I REPL [initandlisten] Replaying stored operations from { : Timestamp(1533190391, 15335) } (exclusive) to { : Timestamp(1533190418, 1) } (inclusive).
2018-08-03T03:59:12.145+0000 F REPL [initandlisten] Oplog entry at { : Timestamp(1533190391, 15335) } is missing; actual entry found is { : Timestamp(1533190393, 1) }
2018-08-03T03:59:12.145+0000 F - [initandlisten] Fatal Assertion 40292 at src/mongo/db/repl/replication_recovery.cpp 218
2018-08-03T03:59:12.145+0000 F - [initandlisten]***aborting after fassert() failure
I tried to take the mongodump on the QueryRouter and it failed too.(the same command had succeeded for earlier dumps).
I have attached the screenshots of Shard3Primary for your reference.