[SERVER-80401] Natural Order Resharding Pipeline Can Attempt to Write Empty Batch Created: 24/Aug/23  Updated: 27/Oct/23  Resolved: 29/Aug/23

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Brett Nawrocki Assignee: Backlog - Replication Team
Resolution: Gone away Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Assigned Teams:
Replication
Operating System: ALL
Backport Requested:
v7.1
Participants:
Linked BF Score: 148

 Description   

The logic to write a batch in ReshardingCollectionCloner::doOneBatch was recently split into another writeOneBatch function, which is called from the path using the natural order pipeline. However, the natural order pipeline does not guard against attempt to write a batch with size 0, whereas the original code path does.

It's possible if the batch returned by the cursor is size 0, then we eventually call getNextOpTimes() with count = 0, which will trigger an invariant in _advanceComponentTimeByTicks when we try to increment the vector clock by 0 ticks. This is seen in BF-29741.



 Comments   
Comment by Jiawei Yang [ 29/Aug/23 ]

This case should never happen if resharding doesn't switch cloner in between, which is fixed in SERVER-80408.

Generated at Thu Feb 08 06:43:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.