Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79955

Need a more complete mechanism for internal readers to avoid fassert when crossing member state PRIMARY to SECONDARY transition

    • Type: Icon: Improvement Improvement
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.2.0-rc0
    • Affects Version/s: None
    • Component/s: Internal Code
    • Labels:
      None
    • Storage Execution NAMER
    • Fully Compatible
    • Execution NAMR Team 2023-09-18, Execution NAMR Team 2023-10-02
    • 150

      The checkInvariantsForReadOptions() function out of an abundance of caution will fassert() unless a reader is known to be prepared to handle not reading from a monotonic view containing earlier writes. For external Clients (are associated with a network connection), the consistency model of the server allows non-snapshot readers to see earlier versions of documents and so switching from ReadSource::kNoTimestamp to ReadSource::kLastApplied is acceptable. However, at the same time, internal Clients (not associated with a network connection) which aren't running their operation using DBDirectClient return false from canReadAtLastApplied() because they aren't known to be prepared to switch to reading from a staler version of the data. This leads to those internal Client readers triggering the fassert() in db_raii.cpp.

      With the advent of PrimaryOnlyServices, the codebase contains more internal Clients which aren't using DBDirectClient and are also running operations also not immediately halted on stepdown. These operations therefore are very likely to cross the member state PRIMARY to SECONDARY transition. Interruption for primary-only service Instances is delivered synchronously but quiescing is intentionally deferred until the node would step back up as primary. So far we have seen this come up in two places in resharding components yet it seems probable other PrimaryOnlyServices would be affected by a similar pattern of wanting to read data to write other data. Some PrimaryOnlyServices may have unknowingly avoided this problem by using AutoGetCollection directly to opt-out of lock-free reads and therefore continue to synchronize around the RSTL lock.

      We should improve the behavior of checkInvariantsForReadOptions() such that we can be confident internal Client readers are getting the consistency level they depend on but also aren't requiring a stepdown to be triggered in testing to have successfully identified all components which can accept a relaxed consistency level.

            Assignee:
            gregory.noma@mongodb.com Gregory Noma
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: