Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-22389

Replica set member with data from backup fails to sync

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.2.1
    • Component/s: Replication
    • Labels:
      None
    • ALL

      Our replica contains a primary, secondary and arbiter. Data is backed up periodically from the secondary using the recommended method (fsync lock & snapshotting). When we create a new member based on the backup - and add it to the set - it will not sync, showing this - endlessly without doing anything:

      2016-02-01T05:33:38.739+0000 I REPL     [ReplicationExecutor] syncing from: in.db2m2.mydomain.com:27017
      2016-02-01T05:33:43.812+0000 I REPL     [ReplicationExecutor] syncing from: in.db2m2.mydomain.com:27017
      2016-02-01T05:33:48.885+0000 I REPL     [ReplicationExecutor] syncing from: in.db2m2.mydomain.com:27017
      

      This happen for very recent backups (a few minutes old). Note that the data dir contains data which is identical to our secondary node, which does sync correctly, even if stopped.
      This worked for us in 3.0.

      Happens with protocol version 0 and 1.

      A log with verbose level 5 is attached. We can easily reproduce it.

      Please advise, the only way for us to add a member is to have a full initial sync which is unrealistic (our DB size is 8TB).

        1. mongod-bsync-logging.gz
          13.22 MB
        2. mongod.log
          326 kB
        3. metrics.2016-02-02T15-07-47Z-00000
          3.47 MB
        4. m.zip
          32.02 MB
        5. log.zip
          984 kB
        6. log
          28 kB
        7. conf
          1 kB

            Assignee:
            scotthernandez Scott Hernandez (Inactive)
            Reporter:
            yonido Yoni Douek
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: