-
Type: Bug
-
Resolution: Duplicate
-
Priority: Major - P3
-
None
-
Affects Version/s: 2.4.8
-
Component/s: Replication
-
None
-
ALL
-
We have a 5 node mongo replica set, and after a network outage, all our apps flushed their data to the mongo master. The master stayed up, and replicated across to the other nodes, but at a specific point, 3 of the replica set members crashed, and we were unable to recover them from that state. Fortunately we were able to restore the nodes from the remaining nodes.
I've attached log files from the master.
A bit of context on what happened when we started the mongo node up. The journal recovered, but then the i/o went through the roof, and we saw the bgsync not keeping up when running with -vvvv. When the fatal exception happens, it's always for the same query on the same capped collection which I assume it's trying to replay from the master. I confirmed that the same query is where the other nodes also break. There is nothing special about the query, it is a small blob with very few fields, and the capped collection has many just like it.
The OS is Ubuntu 10.04.3 LTS. FS is XFS. 22Gb memory.
- duplicates
-
SERVER-6981 maxPasses assertion (on allocation failure) can make capped collection unreadable
- Closed