Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92554

Consider lowering maxIdleThreadAge for oplog applier thread pool

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Replication

      Right now this pool uses the default of 30s, but our thread pool type supports configuring a custom value.

      The change in SERVER-56054 makes it so if we encounter a missed pthread_cond_signal in glibc we can end up waiting up to 30s to wake the threads. This is a long time and can have negative consequences on a cluster e.g. resulting in flow control engaging and increased write latency if the affected node is holding up the majority commit point (such as in a 3-node chained replication configuration where the secondary that syncs from the primary hits this bug. Note that sync source selection considers a node within 30s of the primary eligible so the lag will not prompt us to switch to sync from the primary in that situation).

      We should consider lowering this value (or allowing a user to configure a lower value). Something in the realm of 10s (maybe slightly less) could be a reasonable choice to try to prevent this situation from triggering flow control.

            Assignee:
            Unassigned Unassigned
            Reporter:
            kaitlin.mahar@mongodb.com Kaitlin Mahar
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: