[SERVER-80401] Natural Order Resharding Pipeline Can Attempt to Write Empty Batch Created: 24/Aug/23 Updated: 27/Oct/23 Resolved: 29/Aug/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Brett Nawrocki | Assignee: | Backlog - Replication Team |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v7.1
|
||||||||
| Participants: | |||||||||
| Linked BF Score: | 148 | ||||||||
| Description |
|
The logic to write a batch in ReshardingCollectionCloner::doOneBatch was recently split into another writeOneBatch function, which is called from the path using the natural order pipeline. However, the natural order pipeline does not guard against attempt to write a batch with size 0, whereas the original code path does. It's possible if the batch returned by the cursor is size 0, then we eventually call getNextOpTimes() with count = 0, which will trigger an invariant in _advanceComponentTimeByTicks when we try to increment the vector clock by 0 ticks. This is seen in BF-29741. |
| Comments |
| Comment by Jiawei Yang [ 29/Aug/23 ] |
|
This case should never happen if resharding doesn't switch cloner in between, which is fixed in |