Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9752

Resyncing a Stale Member, Stucked tor STARTUP2

    • Type: Icon: Question Question
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.4.3
    • Component/s: Replication
    • Labels:
      None
    • Environment:
      Freebsd 9.0 amd64

      We had a large dataset, with a stale member, and want to automatically resync it from primary (initial sync)

      After removing it's data directory and starting it again, it went to STARTUP2 state, and started cloning data.
      Data cloning (and indexing) stage took 18 hours, but after initial sync, did not changed at all.

      This portion of log file is:
      Tue May 21 13:39:54.466 [rsSync] oplog sync 1 of 3
      ...
      Wed May 22 02:55:33.679 [rsSync] build index done. scanned 52202932 total records. 5286.34 secs
      Wed May 22 02:55:35.019 [rsSync] oplog sync 3 of 3
      Wed May 22 02:55:35.757 [rsBackgroundSync] repl: old cursor isDead, will initiate a new one
      Wed May 22 02:59:13.691 [rsSync] replSet initialSyncOplogApplication applied 1001 operations, synced to May 21 14:22:18:22
      Wed May 22 03:06:09.440 [rsSync] replSet initialSyncOplogApplication applied 2002 operations, synced to May 21 14:22:37:d
      Wed May 22 03:10:59.526 [rsSync] replSet initialSyncOplogApplication applied 3003 operations, synced to May 21 14:23:25:20
      Wed May 22 03:18:37.975 [rsSync] replSet initialSyncOplogApplication applied 4004 operations, synced to May 21 14:23:49:33
      ...
      Wed May 22 09:56:35.674 [rsSync] replSet initialSyncOplogApplication applied 116116 operations, synced to May 21 15:27:59:10

      I don't know is initial sync successful or not, but we was seeing `initialSyncOplogApplication` logs every ~5min, and sync time was moving very slow (sync time move 1 hour forward after 5 hours!).

      We restarted mongodb service, but unfortunately, it starts to sync from scratch. With log like this:

      Wed May 22 10:43:28.548 [rsStart] replSet I am 172.20.43.11:27118
      Wed May 22 10:43:28.638 [rsStart] replSet STARTUP2
      Wed May 22 10:43:28.645 [rsSync] replSet initial sync pending
      Wed May 22 10:43:58.704 [rsSync] replSet initial sync drop all databases
      Wed May 22 10:43:58.704 [rsSync] dropAllDatabasesExceptLocal 2
      Wed May 22 10:43:58.708 [rsSync] removeJournalFiles
      ....

      I think the state of server should be RECOVERING not STARTUP2, is this correct?
      If yes, why server stuck to STARTUP2, and why server dropped all copied data after restart?

            Assignee:
            thomas.rueckstiess@mongodb.com Thomas Rueckstiess
            Reporter:
            taha_jahangir Taha Jahangir
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: