[SERVER-44529] Re-acquiring locks after a yield and step down results in wrong parameters on recoveryUnit Created: 08/Nov/19 Updated: 29/Oct/23 Resolved: 15/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying, Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc7, 4.2.9, 4.7.0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Matthew Russotto | Assignee: | Louis Williams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | bkp | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Sprint: | Execution Team 2019-12-02, Execution Team 2019-12-16, Execution Team 2020-01-13, Execution Team 2020-01-27, Execution Team 2020-02-10, Execution Team 2019-12-30, Execution Team 2020-02-24, Execution Team 2020-03-09, Execution Team 2020-03-23, Execution Team 2020-04-06, Execution Team 2020-05-04, Execution Team 2020-05-18 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Linked BF Score: | 13 | ||||||||||||||||||||||||||||
| Description |
|
When we acquire collection locks with AutoGetCollectionForRead on a primary, we by default set the TimestampReadSource on the recovery unit to kUnset. However, when we re-acquire locks after a yield in PlanExecutor, we still use that same TimestampReadSource even if we have stepped down. Since reads survive replication state changes, that can result in us reading from within the middle of a replication batch, which means reading possibly causally-related data out of order. Note: This ticket initially described both step up and step down, but step up is now separated out into Potential fix: After recovery from yield, change the read source to kNoOverlap if it was kUnset or kNoTimestamp. |
| Comments |
| Comment by Githook User [ 06/Aug/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: The no-overlap time, ReadSource::kNoOverlap, is the minimum of replication's lastApplied timestamp (cherry picked from commit 25c694f365db0f07a445bd17b6cd5cbf32f5f2f9)
After yielding and reacquiring locks in a query, the preconditions that were used to select our (cherry picked from commit b3a5b5258a487006f0487b9e7e0a1d0d4f1119ff)
This partially reverts work to use the kNoOverlap ReadSource on secondaries since the all_durable (cherry picked from commit ff92d4435fa75c2b947c49cffaf48805b320a5ae) |
| Comment by Louis Williams [ 09/Jun/20 ] |
|
Due to a small performance regression, I have partially reverted this work in |
| Comment by Githook User [ 19/May/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: After yielding and reacquiring locks in a query, the preconditions that were used to select our (cherry picked from commit b3a5b5258a487006f0487b9e7e0a1d0d4f1119ff) |
| Comment by Githook User [ 15/May/20 ] |
|
Author: {'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}Message: After yielding and reacquiring locks in a query, the preconditions that were used to select our |
| Comment by Louis Williams [ 28/Apr/20 ] |
|
Before |
| Comment by Louis Williams [ 27/Apr/20 ] |
|
I've spent some time rationalizing why the proposed solution is correct. The main problem that we want to avoid is allowing an operation to read without a timestamp and then switch to reading with a timestamp. Unaddressed, this visibility problem will cause will once-visible writes to vanish (see The proposed solution in the event of a stepdown after recovering from a yield is the following: if we were reading without a timestamp before stepdown, read at kNoOverlap, which is the minimum of lastApplied and all_durable, falling-back to the solution in This is a scenario I believe demonstrates that it is not possible for a snapshot to "backtrack". For example, say the all_durable is T1 because of a hole at T2. A client reads without a timestamp, seeing writes at T3, which is equal to lastApplied. If a stepdown occurs, all holes are closed because active transactions will either commit just-in-time or be aborted by stepdown. This means that immediately after stepdown, all_durable becomes equal to lastApplied, or T3 (assuming no other writes). More importantly, if we started a read without a timestamp just before stepdown, after stepdown it would be impossible for kNoOverlap (the minimum of all_durable and lastApplied) to move backward and read at a timestamp before T3. This should provide an example of why this solution is safe for replicated collections.
Update: my last statement about reading from the oplog on secondaries being safe is not confirmed Update: we need a different solution on 4.0 where the kNoOverlap ReadSource doesn't exist. We will probably need to partially backport |
| Comment by Matthew Russotto [ 26/Feb/20 ] |
|
It looks like this bug exists on 4.0, but the 3.6 AutoGetCollectionForRead is completely different so it shouldn't be affected. |
| Comment by Tess Avitabile (Inactive) [ 26/Feb/20 ] |
|
Does this need to be backported further back than 4.2? I think it's the case on earlier versions that reads survive the transition from secondary to primary. |
| Comment by Steven Vannelli [ 11/Nov/19 ] |
|
Assigning this to the Execution. Please reach out to Replication team if support is needed. |