Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Gone away
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
None

Assigned Teams:

Sharding
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

During migration, we persist a critical section counter which is replicated to secondaries to make them clear their filtering metadata so that their next refresh will see the result of the migration. The idea is that when the secondary refreshes it will refresh from the primary by calling forceRoutingTableRefresh on the primary and waiting for it to replicate, which waits for the critical section before refreshing.

However, when we persist that critical section counter, we don't use majority write concern, and we never wait for majority before we commit the migration on the config server.

This means that if we
1. Start a migration
2. Write the critical section counter. Suppose it doesn't get replicated at all.
3. Commit the migration on the config server.
4. Failover
5. A new primary is elected which does not know that a migration has occurred, and could continue serving requests for a router which is equally as stale as the secondary, leading to stale data being read.

We should verify this with a jstest and then fix by persisting the critical section counter with majority write concern.

Assignee:: [DO NOT USE] Backlog - Sharding Team
Reporter:: Matthew Saltz (Inactive)
Participants:: [DO NOT USE] Backlog - Sharding Team, Esha Maharishi, Kaloian Manassiev, Matthew Saltz, Randolph Tan
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jun 26 2019 04:05:11 PM UTC
Updated:: Oct 27 2023 08:42:49 PM UTC
Resolved:: Aug 17 2020 09:01:20 AM UTC

Details

Description

Attachments

Activity

People

Dates