[SERVER-32988] Oplog application, foreground index builds pin an unbounded amount of data in WiredTiger Created: 30/Jan/18  Updated: 27/Oct/23  Resolved: 22/Mar/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.7.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Vesselina Ratcheva (Inactive)
Resolution: Works as Designed Votes: 0
Labels: rollback-non-functional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Sprint: Repl 2018-02-26, Repl 2018-03-12, Repl 2018-03-26
Participants:

 Description   

Note, this only applies to the 3.7 development branch.

SERVER-32188 started to timestamp writes on secondaries from command oplog entries. It uses a `TimestampBlock` to pass context on the OperationContext's RecoveryUnit which is applied when transactions commit.

The index build uses a WriteUnitOfWork for each document resulting from the collection scan. This write gets the "commit timestamp" on the recovery unit. Foreground index builds use the recovery unit that is in context from the TimestampBlock. Background index builds use their own OperationContext. This causes background index builds to not timestamp their data writes.

Moreover, foreground index builds block replication. When replication is not progressing, the oldest_timestamp does not advance. If the oldest_timestamp is not advancing, all of the data writes that are part of the index build stay pinned. This can unnecessarily activate lookaside.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 22/Mar/18 ]

It turns out that bulk index builds on WiredTiger (whether the "bulk" option succeeds or not) are, subtly, done outside a begin/commit transaction. The constructor opens a cursor that is used to perform all inserts into the index. These inserts are self-contained "autocommit" transactions that never have a timestamp applied.

Even though the call is inside a committed WUOW, a session had never become "active" (had a transaction start), and likewise commit is not called.

The side-effect of not doing this ticket is that other storage engine's that obey the timestamping contract (of which there are none...that don't extend from WT itself) may pin index builds in memory. However, completing this ticket in a way that includes proof that it was done correctly would require changes to WTRecoveryUnits and/or WT index builds.

Comment by Ian Whalen (Inactive) [ 02/Feb/18 ]

just bumping this to repl team to make sure repl team sees it. will also move kyle's work over to repl team asap.

Generated at Thu Feb 08 04:31:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.