Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.2.1
Component/s: Replication
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Steps To Reproduce:

Hide

For what I see,
-load up a replica set with a lot of data
-take one node off line and wipe the disk
-THIS part I am not clear on:
EITHER - wait until the beginning of the oplog is after the time that the node went down
OR - wait until the oplog is certain to wrap while events are being replayed from it.

Show
For what I see, -load up a replica set with a lot of data -take one node off line and wipe the disk -THIS part I am not clear on: EITHER - wait until the beginning of the oplog is after the time that the node went down OR - wait until the oplog is certain to wrap while events are being replayed from it.
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Hi, We had a node of a 3 node replica set fail with a bad disk and we had to stop the node. After 2 days or so, we got the node back online. We no longer had the dataDir, so we had to resync automatically (which we have done before). It took 18 hours or so to resync. Then it appears to have started reading from the oplog and then the member became too stale to recover. So basically, we resynced for 18 hours (~500gb) and then it died when reading the oplog because the oplog contained less that 18 hours of data, it seems.

I have no idea why this would be the case. Maybe someone can shed some light on this. We could never size the oplog to hold all we would ever need to resync (new data the was loaded) since a node went down. What if it was down for 2 weeks. This wasn't even a lot of data... What if it took a week to resync 5TB? Not sure how this works. I am going to try and attach the logs

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

diagnostic.data.tar.gz
Mar 10 2016 05:14:58 PM UTC
59.50 MB
James Mangold
mongodb.log.2016-03-08T01-34-34.gz
Mar 10 2016 05:14:58 PM UTC
2 kB
James Mangold
mongodb.log.2016-03-08T01-38-02.gz
Mar 10 2016 05:14:58 PM UTC
3 kB
James Mangold
mongodb.log.2016-03-09T01-38-04.gz
Mar 10 2016 05:14:58 PM UTC
323 kB
James Mangold
mongodb.log.2016-03-10T01-38-04.gz
Mar 10 2016 05:14:58 PM UTC
327 kB
James Mangold
mongodb.log.gz
Mar 10 2016 05:14:58 PM UTC
216 kB
James Mangold

Assignee:: Scott Hernandez (Inactive)
Reporter:: James Mangold
Participants:: James Mangold, Scott Hernandez
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Mar 10 2016 05:03:57 PM UTC
Updated:: Mar 11 2016 04:08:58 PM UTC
Resolved:: Mar 11 2016 04:08:48 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates