Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-73577

Instance in Recovering State, Initial Sync Fails

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • ALL

      Hello,

      I have a mongo cluster with 3 instances, 1 of the instances updated its state as Recovering, when i checked the mongod.log i saw an error like "non-specific WireTiger error", here is an image of it:

       

      I deleted the data directory and attempted to start an initial sync after this issue but the sync had intrupted with another error which was : "Restarting oplog query due to error: NetworkInterfaceExceededTimeLimit: error in fetcher batch callback", image:

      It started the sync progress from the start, it has been almost 4 days but it's still at startup state.

       

      The data size is around 650 gigs, after the copying and indexing has been finished, it has been doing oplog operation for 2 days now. It's trying to catch up to the cluster, beacuse of it's been behind of the cluster for several days, the oplog section is taking too long.

      I am trying to understand why it updated its state as recovering, is it because the data did corrupt somehow? 

      By the way, this member of the cluster has done this error more than once, the other members are doing just fine. Even though i sync this member to the others in the end, it repeats this error. 

       

      Is there a specific reason to repeat this kinda error?

       

       

       

        1. error1.jpg
          error1.jpg
          761 kB
        2. error2.jpg
          error2.jpg
          382 kB

            Assignee:
            yuan.fang@mongodb.com Yuan Fang
            Reporter:
            ilker.demirci@netmera.com İlker Demirci
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: