Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.4.1, 4.7.0, 4.2.10, 4.0.22
Affects Version/s: 3.6.18, 4.5.1, 4.0.18, 4.2.7, 4.4.0-rc8
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2, v4.0
Sprint:
Sharding 2020-06-15, Sharding 2020-06-29
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The donor writes the enterCriticalSectionCounter flag
-> which causes secondaries to clear their filtering metadata
-> which causes the next versioned request on the secondary to throw StaleConfig and trigger the secondary to refresh
-> which causes the secondary to send flushRoutingTableCacheUpdates to the primary
-> which blocks behind the critical section only if reads are being blocked

In 4.4 and earlier versions, if reads haven't started being blocked yet, the secondary will finish the refresh and serve reads for stale mongoses even if the migration commits.

For example:

Donor writes enterCriticalSectionSignal at T90
Secondary sees the flag, invalidates its filtering metadata
Secondary gets versioned read, sendsflushRoutingTableCacheUpdates, gets back success
Donor starts blocking writes
Donor commits the migration, which succeeds at T100
Client does a write from mongos1, which contacts donor and gets back StaleConfig, then retries write on recipient, which succeeds at T101
Client does afterClusterTime: T101 read from mongos2, which is stale and contacts the donor secondary. >>> That secondary will wait until T101, then serve the read <<<

In 4.5, that happens to not be an issue since the refresh is done by calling onShardVersionMismatch which waits for the critical section as long as writes are already being blocked.

Despite that, we want to change flushRoutingTableCacheUpdates in all versions to block behind the critical section with kWrite, not kRead, as it does today.

related to

SERVER-50898 safe_secondary_reads_causal_consistency.js must wait for effects of _configsvrCommitChunkMigration to be majority-committed snapshot on all CSRS members

Closed

Assignee:: Luis Osta (Inactive)
Reporter:: Esha Maharishi (Inactive)
Participants:: Esha Maharishi, Githook User, Luis Osta
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Jun 09 2020 09:49:45 PM UTC
Updated:: Oct 29 2023 10:07:15 PM UTC
Resolved:: Jul 01 2020 03:14:01 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates