Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Cluster Scalability
Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

There is a resahrding bug where the critical section can never engaged due the in-memory oplogEntriesFetched count is larger than the number actual of oplog entries in the resharding oplog buffer (when they are expected to match). This happens through the following mechanism:

Fetcher increments the oplogEntriesFetched count before inserting an oplog entry batch into the buffer collection
The insert into the buffer can fail for a variety of reasons such as a WriteConflict exception.
The fetcher retries inserting the oplog entry batch, but does not reset the count it erroneously added. This then causes the oplogEntriesFetched count to be permanently higher than what's actually in the buffer.

If oplogEntriesFetched is permanently higher than what exists in the buffer, then oplogEntriesApplied can never catch up (since oplogEntriesApplied reflects how many buffered entries have been applied and is therefore bounded by the number of entries in the oplog buffer).

When the oplog application phase runs long enough, this incorrect "remaining work" (fetched - applied) can push the estimated time to apply the remaining work to be above the critical section entry threshold (estimatedTime = timeInApplying * (fetched / applied - 1)), resulting in the critical section never being engaged.

Assignee:: Unassigned
Reporter:: Wenqin Ye
Participants:: Wenqin Ye
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Feb 02 2026 08:06:30 PM UTC
Updated:: Feb 09 2026 03:58:42 PM UTC
Resolved:: Feb 09 2026 03:58:42 PM UTC

Details

Description

Attachments

Activity

People

Dates