Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Storage
Labels:
- neweng

Assigned Teams:

Storage Execution
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Linked BF Score:
0
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Capped collections in WiredTiger normally trigger deletes as inserts are performed. For performance reasons, the oplog truncates old documents in batches. This is done via a data structure known as OplogStones.

A background thread is triggered to periodically look at the oplog size and it may then choose to call reclaimOplog

reclaimOplog calls one truncate for each OplogStone popped from the beginning of _oplogStones. The truncate method deletes records ranging from the previous stone's lastRecord (saved on the _oplogStones->firstRecord here) to the current stone's lastRecord.

The invariant for this to work is that the lastRecord in consecutive stones must be increasing. As inserts to the oplog commit, their recordId will increase the lastRecord "if applicable".

Why "if applicable"? With document level locking storage engines, transactions can commit out of timestamp order. Since these RecordIds are* the timestamp values in disguise, the OplogStone datastructure has to deal with record ids arriving out of order.

The only piece wrong with that logic is that if !_stones.empty() returns false (i.e: all the existing stones have been purged), we will unconditionally create a new stone. This stone will have the recordId that committed. Because there are no stones, the code did not validate it would be consumed in a valid call to truncate.

The corollary logic (for demonstration, not the required solution) here would be to then check (against firstRecord)

if (_stones.empty() && lastRecord < firstRecord) {
  return;
}

This would protect the code from attempting a truncation where "start > stop".

One idea is to never pass in the start cursor to WiredTiger. Always let WiredTiger handle positioning the cursor to the beginning of the oplog (oldest record) to start truncating from.

duplicates

SERVER-32533 In oplog truncate with WT, don't use a start key

Closed

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Daniel Gottlieb (Inactive)
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Daniel Gottlieb, Michael Cahill
Votes:: 0 Vote for this issue
Watchers:: 13 Start watching this issue

Created:: Nov 10 2017 10:17:07 PM UTC
Updated:: Dec 06 2022 03:47:00 AM UTC
Resolved:: Jan 23 2018 06:08:47 AM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates