Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33888

Enabling fsyncLock on the config server primary may cause operations to block behind the Balancer thread

    • Type: Icon: Improvement Improvement
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.7.3
    • Component/s: Sharding
    • Sharding EMEA
    • Sharding EMEA 2023-05-29, Sharding EMEA 2023-06-12

      Per conversation with Kal, I've been running into deadlocks while trying to replace our TLS transport, specifically during ReplSetTest shutdown sequence, the fsync lock is set, but shortly thereafter, the Balancer attempts to start a round.

      https://github.com/mongodb/mongo/blob/cdb8f2f7ad472416c579c6c13292d3fb361d94cb/src/mongo/db/s/balancer/balancer.cpp#L347
      _checkOIDs throws an exception when it notices that the shards are offline (as they should be), and the exception catcher then tries to log the action which requires an (unavailable) write lock.
      https://github.com/mongodb/mongo/blob/cdb8f2f7ad472416c579c6c13292d3fb361d94cb/src/mongo/db/s/balancer/balancer.cpp#L410

      Meanwhile, the ReplSetTest shutdown sequence gets stuck behind a read lock attempting to fetch collStats, but can't because the Balancer's write lock is still pending. https://github.com/mongodb/mongo/blob/cdb8f2f7ad472416c579c6c13292d3fb361d94cb/src/mongo/shell/replsettest.js#L1633

      See also the following stack: https://gist.github.com/sgolemon/f957e2e2f38e14c0d3a0a661991c7a94

            Assignee:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Reporter:
            sara.golemon@mongodb.com Sara Golemon
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: