Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-23045

Auto Sync of a failed node Failed

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.1
    • Component/s: Replication
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Hide

      For what I see,
      -load up a replica set with a lot of data
      -take one node off line and wipe the disk
      -THIS part I am not clear on:
      EITHER - wait until the beginning of the oplog is after the time that the node went down
      OR - wait until the oplog is certain to wrap while events are being replayed from it.

      Show
      For what I see, -load up a replica set with a lot of data -take one node off line and wipe the disk -THIS part I am not clear on: EITHER - wait until the beginning of the oplog is after the time that the node went down OR - wait until the oplog is certain to wrap while events are being replayed from it.

      Hi, We had a node of a 3 node replica set fail with a bad disk and we had to stop the node. After 2 days or so, we got the node back online. We no longer had the dataDir, so we had to resync automatically (which we have done before). It took 18 hours or so to resync. Then it appears to have started reading from the oplog and then the member became too stale to recover. So basically, we resynced for 18 hours (~500gb) and then it died when reading the oplog because the oplog contained less that 18 hours of data, it seems.

      I have no idea why this would be the case. Maybe someone can shed some light on this. We could never size the oplog to hold all we would ever need to resync (new data the was loaded) since a node went down. What if it was down for 2 weeks. This wasn't even a lot of data... What if it took a week to resync 5TB? Not sure how this works. I am going to try and attach the logs

        1. diagnostic.data.tar.gz
          59.50 MB
        2. mongodb.log.2016-03-08T01-34-34.gz
          2 kB
        3. mongodb.log.2016-03-08T01-38-02.gz
          3 kB
        4. mongodb.log.2016-03-09T01-38-04.gz
          323 kB
        5. mongodb.log.2016-03-10T01-38-04.gz
          327 kB
        6. mongodb.log.gz
          216 kB

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            james.mangold@interactivedata.com James Mangold
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: