Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-26780

SyncTail::getMissingDoc() should retry on SocketExceptions

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Gone away
    • Affects Version/s: 3.2.12, 3.4.2, 3.5.2
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
    • Environment:
      ubuntu, mongo 3.2.10

      Description

      A secondary is failing to perform the initial sync with another secondary to join a replica set.

      It fails due to a socket receive timeout when talking to the other secondary during the initial sync.

      I have attached the final lines of the log from the secondary trying to join the replica set.

      NB: we never see any "network problem detected" lines in our logs, so it seems as if there is never any retries:
      https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/db/repl/sync_tail.cpp#L968-L969

      I think the SocketException due to the timeout is being caught earlier:
      https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/util/net/message_port.cpp#L204-L210
      which then triggers the assertion exception
      https://github.com/mongodb/mongo/blob/r3.2.10/src/mongo/client/dbclient.cpp#L811-L814

      I do not believe the fix in https://jira.mongodb.org/browse/SERVER-9528 was correct due to the exception swallowing.

        Attachments

        1. mms-mongo-1-106.log
          5 kB
        2. mms-mongo-1-110.log
          1 kB

          Issue Links

            Activity

              People

              Assignee:
              backlog-server-repl Backlog - Replication Team
              Reporter:
              rob.clancy@intercom.io Rob Clancy
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: