Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Storage Engines - Server Integration
Operating System:
ALL
Linked BF Score:
200
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Bug

The disagg OplogProvider ships oplog entries to the SLS log server. After a yield or on step-up it rebuilds its cursor and seeks back to its last read position via seekExact(lastRecordIdRead), asserting the record is found (oplog_provider.cpp#L296).

Oplog truncation is independent of the provider's read position, so if it removes that entry, seekExact returns none and the node crashes.

Fix

The likely direction is to protect against truncation passing the resume point, similar to ~~SERVER-128312~~ (which bounded truncation via computeTruncationBound()).
A few things to work out:

What the resume point should anchor on. The provider's local read cursor seems insufficient, since on step-up the resume LSN comes from the remote log server (log_server_manager.cpp#L1302-L1309) with no check against the local oplog floor, so it may need to anchor on the SLS last-written LSN and hold on every electable node.
Whether a pin alone is enough, or step-up also needs to handle a missing resume point gracefully (refuse step-up / resync) rather than crash-looping. This probably depends on whether such a bound can be guaranteed on a stepping-up node.

is related to

SERVER-128312 Ensure oplog truncation does not pass last metadata checkpoint timestamp

Closed

Assignee:: Unassigned
Reporter:: Shin Yee Tan
Participants:: Shin Yee Tan
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Jun 25 2026 06:16:49 PM UTC
Updated:: Jul 01 2026 10:47:17 PM UTC
Resolved:: Jul 01 2026 10:47:17 PM UTC

Details

Description

Bug

Fix

Attachments

Issue Links

Activity

People

Dates

PagerDuty