Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57669

Replication is not catching up on one of the Secondary node.

    XMLWordPrintableJSON

Details

    • Icon: Question Question
    • Resolution: Incomplete
    • Icon: Major - P3 Major - P3
    • None
    • None
    • None
    • None

    Description

      ISSUE SUMMARY
      Customer had a high load on the cluster and started having replication lag. The
      Node 00-01 has been in a DOWN state from a day.

      Initial issue was the node was in an infinite restart loop and The snapshot process started 2021-06-11 at 07:22 and ran until 23:13. And, the node encountered errors when starting up. The CoE noticed that there is a "duplicate key" issue prior to node replacement.

      At this point CoE has restarted Node 01 up. However replication lag is continuing to increase on the secondary member 00-01. The workload has been reduced by the customer however, replication lag is not catching up. 15,360 IOPS available, there is high latency and queueing on the disk, despite the volume of IOPS not exceeding 1000

       

      USER IMPACT
      The replication lag on the Secondary is impacting production for customer.

      Attachments

        Activity

          People

            Unassigned Unassigned
            subha.arunachalam@mongodb.com Subha Arunachalam
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: