Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying
Labels:
- post-rc0
- sbe-post-rc0

Sprint:
Query Execution 2021-03-22, Query Execution 2021-05-03, Query Execution 2021-05-17, Query Execution 2021-05-31, Query Execution 2021-06-14
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In SBE, we generate special optimized plans in the case where we are running a collection scan on the oplog. A number of these plans involve either resolving a recordId in advance, or performing two separate scans within the same execution plan; one scan which checks some condition or produces some output, and a second "real" scan which uses that output as a parameter of its own execution. During development of ~~SERVER-50580~~, we realised that it may be possible for these optimized oplog plans to behave incorrectly if entries fall off the oplog in the latency between the time we run the first part of the plan and the time we begin executing the "real" scan.

Here, for instance, we resolve the recordId of the entry to which we want to skip before constructing the SBE plan, and then inject it into the plan as a constant value. But it may be possible that, in the time between the point at which we resolve the seekRecordId and the point at which we actually begin executing the scan, that record has fallen off the oplog. If this happens, we will incorrectly EOF the scan immediately.

Similarly, here we create a NLJ whose outer branch scans the oplog until it reaches the first entry that matches our filter, and then passes that recordId to the inner branch, which continues the scan from that point without applying the filter to any subsequent entries. But if that recordId falls off the oplog between the time the first scan completes and the time the second scan begins (including, but potentially not limited to, the case where we yield at the wrong moment), we will again hit a spurious EOF. The same is true of the ASSERT_MIN_TS UnionStage plan proposed in ~~SERVER-50580~~.

We do not believe that tailable cursors in general are susceptible to this problem, despite using a two-scan plan, because the first scan always scans to EOF before passing the last observed recordId to the second scan. The user will have to issue a getMore before the second branch is executed; if the recordId has fallen off the capped collection by then, the plan will throw CappedPositionLost. Tailable awaitData cursors may be more susceptible, since they will continue to attempt to pull from the second branch after the first branch EOFs.

The way to resolve the "double scan" scenario would be to incorporate the filtering performed by the first scan directly into the second, so that it is executed inline with the "real" scan; this means that there would be no inter-scan latency window during which entries could unexpectedly fall off the oplog. This solution would require a way to execute the filter only once, which could be implemented either by introducing a SegmentStage to generate a sequence of incrementing integer values, or by adding an "executeOnce" mode to the existing FilterStage. The issue caused by resolving the seekRecordId before building the plan would require some further thought, possibly pushing down the logic to obtain the recordId into the ScanStage as is in done in the classic engine.

Before committing to this work, however, we should confirm that the scenarios above can actually arise. This may involve adding failpoints into SBE plans to cause them to freeze or yield at the appropriate moment, forcing the oplog to rollover, then allowing the plan to continue.

is related to

SERVER-57325 Fix bugs due to race conditions in SBE oplog plans

Closed

SERVER-50580 SBE should obey ASSERT_MIN_TS_HAS_NOT_FALLEN_OFF_OPLOG flag

Closed

related to

SERVER-57365 Disable oplog scans in SBE

Closed

Assignee:: Drew Paroski
Reporter:: Bernard Gorman
Participants:: Bernard Gorman, Drew Paroski
Votes:: 0 Vote for this issue
Watchers:: 10 Start watching this issue

Created:: Feb 05 2021 04:56:42 PM UTC
Updated:: Oct 18 2021 04:48:29 PM UTC
Resolved:: Jun 01 2021 03:27:00 PM UTC
Confidence Status Last Update:: 29/Apr/21 3:08 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates