-
Type: Improvement
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Internal Code
-
None
-
Storage Execution NAMER
-
Fully Compatible
-
Execution NAMR Team 2023-09-18, Execution NAMR Team 2023-10-02
-
150
The checkInvariantsForReadOptions() function out of an abundance of caution will fassert() unless a reader is known to be prepared to handle not reading from a monotonic view containing earlier writes. For external Clients (are associated with a network connection), the consistency model of the server allows non-snapshot readers to see earlier versions of documents and so switching from ReadSource::kNoTimestamp to ReadSource::kLastApplied is acceptable. However, at the same time, internal Clients (not associated with a network connection) which aren't running their operation using DBDirectClient return false from canReadAtLastApplied() because they aren't known to be prepared to switch to reading from a staler version of the data. This leads to those internal Client readers triggering the fassert() in db_raii.cpp.
With the advent of PrimaryOnlyServices, the codebase contains more internal Clients which aren't using DBDirectClient and are also running operations also not immediately halted on stepdown. These operations therefore are very likely to cross the member state PRIMARY to SECONDARY transition. Interruption for primary-only service Instances is delivered synchronously but quiescing is intentionally deferred until the node would step back up as primary. So far we have seen this come up in two places in resharding components yet it seems probable other PrimaryOnlyServices would be affected by a similar pattern of wanting to read data to write other data. Some PrimaryOnlyServices may have unknowingly avoided this problem by using AutoGetCollection directly to opt-out of lock-free reads and therefore continue to synchronize around the RSTL lock.
We should improve the behavior of checkInvariantsForReadOptions() such that we can be confident internal Client readers are getting the consistency level they depend on but also aren't requiring a stepdown to be triggered in testing to have successfully identified all components which can accept a relaxed consistency level.
- is depended on by
-
SERVER-80280 Consider introducing concept of draining internal readers after stepdown and before starting secondary oplog application
- Closed
- is related to
-
SERVER-59775 ReshardingDonorOplogIterator triggers an fassert() when it continues to run in member state SECONDARY following a stepdown
- Closed
-
SERVER-79802 Allow resharding donor oplog iterator to read with no timestamp
- Closed
-
SERVER-48452 Internal readers should default to reading without a timestamp
- Closed