Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Won't Fix
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Replication
Labels:
None

Assigned Teams:

Replication
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The changes from ~~SERVER-48452~~ enforce that internal readers on mongod must not read at ReadSource::kNoTimestamp while the mongod is in replica set member state SECONDARY. Instead, the internal readers must support relaxing their consistency model and read at the earlier ReadSource::kLastApplied or an fassert() is triggered. This is because reads at ReadSource::kNoTimestamp on a secondary would otherwise see partial effects of secondary oplog application as new snapshots are acquired and would potentially lead to other anomalous behavior.

However, this poses a problem because internal readers survive stepdown and would have been reading at ReadSource::kNoTimestamp while the mongod was in replica set member state PRIMARY. Services such as resharding and chunk migration have therefore triggered this fassert() in practice (~~SERVER-59775~~, ~~SERVER-80200~~) despite there being no meaningful harm if they were to read at ReadSource::kNoTimestamp. Resharding and chunk migration would eventually fail with ErrorCodes::NotWritablePrimary in replica set member state SECONDARY because their anomalous read is always later followed by a write which requires still being the primary.

The shouldReadAtLastApplied() function consults the replica set member state for making the decision as to whether or not to trigger the fassert(). Reads at ReadSource::kNoTimestamp while the mongod was in replica set member state SECONDARY are still valid so long as secondary oplog application has not yet begun. But the shouldReadAtLastApplied() function cannot express this condition precisely enough because the ReplicationCoordinatorImpl doesn't offer a drain mode after stepdown where services which are only run in replica set member state PRIMARY are guaranteed to have quiesced. And it may be for good reason - services acknowledging interruption is only ever best-effort and delaying secondary oplog application from starting could be worse for the application and majority-commit latency.

bool shouldReadAtLastApplied(OperationContext* opCtx,
                             boost::optional<const NamespaceString&> nss,
                             std::string* reason) {
    ...

    // If this node can accept writes (i.e. primary), then no conflicting replication batches are
    // being applied and we can read from the default snapshot. If we are in a replication state
    // (like secondary or primary catch-up) where we are not accepting writes, we should read at
    // lastApplied.
    if (repl::ReplicationCoordinator::get(opCtx)->canAcceptWritesForDatabase(
            opCtx, DatabaseName::kAdmin)) {
        if (reason) {
            *reason = "primary";
        }
        return false;
    }

    // If we are not secondary, then we should not attempt to read at lastApplied because it may not
    // be available or valid. Any operations reading outside of the primary or secondary states must
    // be internal. We give these operations the benefit of the doubt rather than attempting to read
    // at a lastApplied timestamp that is not valid.
    if (!repl::ReplicationCoordinator::get(opCtx)->isInPrimaryOrSecondaryState(opCtx)) {
        if (reason) {
            *reason = "not primary or secondary";
        }
        return false;
    }

    ...
}

depends on

SERVER-79955 Need a more complete mechanism for internal readers to avoid fassert when crossing member state PRIMARY to SECONDARY transition

Closed

is related to

SERVER-59775 ReshardingDonorOplogIterator triggers an fassert() when it continues to run in member state SECONDARY following a stepdown

Closed

SERVER-80200 Temporarily do not enforce constraints when fetching active transaction history

Closed

SERVER-48452 Internal readers should default to reading without a timestamp

Closed

Assignee:: [DO NOT USE] Backlog - Replication Team
Reporter:: Max Hirschhorn
Participants:: [DO NOT USE] Backlog - Replication Team, Max Hirschhorn, Opal Hoyt, Samyukta Lanka
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Aug 21 2023 06:39:23 PM UTC
Updated:: Sep 25 2023 06:21:29 PM UTC
Resolved:: Sep 25 2023 06:21:29 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates