Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-14391

Investigate changes in SERVER-56054: Change minThreads value for replication writer thread pool to 0

    XMLWordPrintable

    Details

      Description

      Description

      Downstream Change Summary

      Added server parameter replWriterMinThreadCount - The minimum number of threads in the thread pool used to apply the oplog with default 0. Secondary oplog application will use up to replWriterThreadCount threads (https://docs.mongodb.com/manual/reference/parameters/#mongodb-parameter-param.replWriterThreadCount). If the newly added replWriterMinThreadCount is less than replWriterThreadCount, the thread pool will timeout idle threads and only keep replWriterMinThreadCount idle threads in the pool. (replWriterMinThreadCount must be less than or equal to replWriterThreadCount)

      Description of Linked Ticket

      SERVER-54805 describes a case where the replication machinery on replica set secondaries ceases to make progress, with the symptom being that all threads in the replication writer thread pool are idle and the thread driving secondary replication is simultaneously blocked waiting for those writer threads to finish their work.

      So far, this behavior has only manifest on systems with glibc versions susceptible to this glibc pthread condition variable bug. While I have not been able to build a minimal reproduction using our ThreadPool type, the scenario proven to exist in this blog post about using TLA+ to model glibc condition variables is perfectly analogous to how replication uses thread pools. In this scenario, a signal delivery that is lost due to the glibc bug leads to incomplete work being left in the thread pool, and no threads waking up to perform the work.

      Fortunately, a low-risk workaround for this bug as it manifests in the replication system's use of ThreadPool exists. By setting minThreads to 0 instead of its current value, which is equal to maxThreads, we ensure that all waits performed by worker threads eventually wake up due to expiration of the idle thread timeout.

      The task in this ticket is to change the value of minThreads in the writer thread pool used by replication to 0. This will not eliminate all possible failures due to the glibc bug, but it will eliminate the only one we've seen in practice until such time as the bug in glibc is corrected.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              andrew.feierabend Andrew Feierabend (Inactive)
              Reporter:
              backlog-server-pm Backlog - Core Eng Program Management Team
              Participants:
              Last commenter:
              Githook User Githook User
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Due:
                Created:
                Updated:
                Resolved:
                Days since reply:
                4 weeks ago
                Date of 1st Reply: