-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Cluster Scalability
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The idea is to make the donor iterate over the documents instead of using the count command.
The advantage are:
- Ensure that only 1 scan is running at a time, regardless of how many failovers. This prevents the issue when config server keeps on resending the count when it fails over (currently, the config server sends count to donors)
- Each donor's progress is independent from each other (currently, all count command to all donors have to succeed at the same time)
- Since we are iterating over the documents, we will be able to get regular response from the query layer, allowing us to slowly store progress and thus be able to resume when it gets interrupted or fails over (the edge case is when there's a huge continuous range of orphans it has to iterate over).
- There is no network traffic since everything is executed locally.
Some minor details:
- Must ensure that query is properly versioned so orphans will be filtered out.
- We might still want to scan the compatible shard key index so the queries will be covered. However, to make it covered, we have to only project the index fields. This can be problematic if there are multiple docs with the same shard key. Maybe the solution is to use the same $natural order query + resumeToken used by the cloner (this means we will have to pin to a node unless recordIds are replicated).
- Must use snapshot read at minFetchTimestamp.
We will also need a new command for coordinator to extract this info from the donor: extractReshardingDonorInitialDocCount or maybe make one of the transition commands from shard authoritative track (SPM-3978) return this value.
- is related to
-
SERVER-112444 Resharding validation aggregation shouldn't hint the _id index
-
- Backlog
-
- related to
-
SERVER-120482 Consider splitting resharding validator's count call into smaller ranges
-
- Backlog
-