Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-95916

Investigate if heartbeat handle list can grow unbounded

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Replication

      In recent production cases of replication lag, we've seen all secondaries slowed by what looks to be replication coordinator mutex contention. One of the symptoms include a "Scheduling heartbeat to fetch newer config" log line repeated every ~500 ms on each secondary. The primary had just been elected, and we suspect the secondaries were failing to retrieve the new config with a higher config version and term.

      The root cause was not immediately clear, but we suspect was that it had to do with a buildup of replication heartbeats. We saw a linear increase in the number of replSetHeartbeat commands on the primary, with the maximum hitting ~20,000 commands/s. The flame graphs indicated 50 threads spending time in heartbeat code on each secondary.

      Whenever the secondary receives a heartbeat from a primary with a new config version and term, it'll schedule a heartbeat to fetch the new config. The task is scheduled on the replication thread pool for immediate execution. There are 50 threads here, which corresponds with the 50 threads from the flame graph. Notably, sending a heartbeat takes the replication coordinator mutex.

      Our theory is that somehow, a network mishap on the primary rendered the secondaries unable to complete the heartbeat reconfig. However, the secondaries were still receiving heartbeats from the primary, and on each heartbeat, we scheduled a new heartbeat task on the executor, adding to the _heartbeatHandles vector.

      This all remains a theory so far, so this ticket's scope is to investigate via code inspection and my starter reproducer script in the linked ticket. If this is possible, we should attempt to prune the _heartbeatHandles list, similar to what we did for the replication waiter list.

            Assignee:
            Unassigned Unassigned
            Reporter:
            ali.mir@mongodb.com Ali Mir
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: