Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-17139

More retries needed when there are sync errors after initial sync

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.4
    • Component/s: None
    • Labels:
      None
    • ALL

      I just did a two hour initial sync and then it failed in the sync up step due to network error. As a result it started the initial sync again. Instead it will be good to have some retries before giving up and starting the intial sync again

      2015-01-31T01:24:48.185+0000 [conn636] authenticate db: local

      { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }

      2015-01-31T01:24:50.066+0000 [rsSync] Index Build: 5600/5898 94%
      2015-01-31T01:24:52.763+0000 [conn634] end connection 23.98.145.8:1176 (1 connection now open)
      2015-01-31T01:24:52.884+0000 [initandlisten] connection accepted from 23.98.145.8:1177 #637 (2 connections now open)
      2015-01-31T01:24:52.994+0000 [conn637] authenticate db: local

      { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }

      2015-01-31T01:24:53.105+0000 [rsSync] Index Build: 5800/5898 98%
      2015-01-31T01:24:53.239+0000 [rsSync] build index done. scanned 5898 total records. 162.071 secs
      2015-01-31T01:24:53.256+0000 [rsSync] replSet initial sync cloning db: admin
      2015-01-31T01:24:53.565+0000 [FileAllocator] allocating new datafile /mongodb_data/admin.ns, filling with zeroes...
      2015-01-31T01:24:53.636+0000 [FileAllocator] done allocating datafile /mongodb_data/admin.ns, size: 16MB, took 0.07 secs
      2015-01-31T01:24:53.639+0000 [FileAllocator] allocating new datafile /mongodb_data/admin.0, filling with zeroes...
      2015-01-31T01:24:53.656+0000 [FileAllocator] done allocating datafile /mongodb_data/admin.0, size: 16MB, took 0.016 secs
      2015-01-31T01:24:53.782+0000 [rsSync] build index on: admin.system.version properties: { v: 1, key:

      { _id: 1 }

      , name: "id", ns: "admin.system.version" }
      2015-01-31T01:24:53.782+0000 [rsSync] building index using bulk method
      2015-01-31T01:24:53.783+0000 [rsSync] build index done. scanned 1 total records. 0 secs
      2015-01-31T01:24:53.902+0000 [rsSync] build index on: admin.system.users properties: { v: 1, key:

      { _id: 1 }

      , name: "id", ns: "admin.system.users" }
      2015-01-31T01:24:53.902+0000 [rsSync] building index using bulk method
      2015-01-31T01:24:53.903+0000 [rsSync] build index done. scanned 1 total records. 0 secs
      2015-01-31T01:24:53.903+0000 [rsSync] replSet initial sync data copy, starting syncup
      2015-01-31T01:24:53.903+0000 [rsSync] oplog sync 1 of 3
      2015-01-31T01:24:53.927+0000 [rsSync] Socket say send() errno:110 Connection timed out 23.98.145.8:27017
      2015-01-31T01:24:53.961+0000 [rsSync] connection lost to SG-azrs6-721.devservers.mongodirector.com:27017; is your tcp keepalive interval set appropriately?
      2015-01-31T01:24:53.994+0000 [rsSync] replSet initial sync exception: 9001 socket exception [FAILED_STATE] server [SG-azrs6-721.devservers.mongodirector.com:27017 (23.98.14
      5.8) failed] 5 attempts remaining
      2015-01-31T01:25:19.962+0000 [conn636] end connection 23.98.146.173:1177 (1 connection now open)
      2015-01-31T01:25:20.078+0000 [initandlisten] connection accepted from 23.98.146.173:1176 #638 (2 connections now open)
      2015-01-31T01:25:20.192+0000 [conn638] authenticate db: local

      { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }

      2015-01-31T01:25:23.996+0000 [rsSync] replSet initial sync pending
      2015-01-31T01:25:23.996+0000 [rsSync] replSet syncing to: SG-azrs6-721.devservers.mongodirector.com:27017
      2015-01-31T01:25:24.773+0000 [conn637] end connection 23.98.145.8:1177 (1 connection now open)
      2015-01-31T01:25:24.894+0000 [initandlisten] connection accepted from 23.98.145.8:1176 #639 (2 connections now open)
      2015-01-31T01:25:25.003+0000 [conn639] authenticate db: local

      { authenticate: 1, nonce: "xxx", user: "__system", key: "xxx" }

      2015-01-31T01:25:26.625+0000 [rsSync] replSet initial sync drop all databases
      2015-01-31T01:25:26.632+0000 [rsSync] dropAllDatabasesExceptLocal 3
      2015-01-31T01:25:27.472+0000 [rsSync] removeJournalFiles
      2015-01-31T01:25:30.413+0000 [rsSync] removeJournalFiles
      2015-01-31T01:25:30.635+0000 [rsSync] replSet initial sync clone all databases
      2015-01-31T01:25:30.752+0000 [rsSync] replSet initial sync cloning db: testdblarge3
      2015-01-31T01:25:31.329+0000 [FileAllocator] allocating new datafile /mongodb_data/testdblarge3.ns, filling with zeroes...

            Assignee:
            Unassigned Unassigned
            Reporter:
            dharshanr@scalegrid.net Dharshan Rangegowda
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: