[SERVER-82305] Have dbCheck ignore prepare conflicts on secondaries Created: 18/Oct/23  Updated: 18/Nov/23  Resolved: 15/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0, 7.2.0-rc2

Type: Bug Priority: Major - P3
Reporter: Sean Zimmerman Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-74793 dbCheck behaves differently on primar... Closed
Assigned Teams:
Storage Execution
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.2, v7.0
Sprint: Execution Team 2023-11-13, Execution Team 2023-11-27
Participants:
Linked BF Score: 8

 Description   

In BF-30418 we discovered that dbCheck can hit a prepare conflict on secondaries and fail a wiredtirger invariant than a thread which encounters a prepare conflict must be killable.

We feel that dbCheck needs to enforce prepare conflicts for correctness, and the oplog applier thread should remain unkillable.

To fix this we should expand the PrepareConflictBehavior to be able to propagate an error instead of retrying the conflict (and running into the invariant mentioned). This will allow dbCheck to finish with a warning that a certain key range could not be validated



 Comments   
Comment by Githook User [ 16/Nov/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-82305 dbCheck should ignore prepare conflicts on secondaries

(cherry picked from commit 6007ed70a67624d75509eca3d5adf7aee3d03cc9)
Branch: v7.2
https://github.com/mongodb/mongo/commit/2b2d11bae32470979f761c5ccc1558e93dcc9881

Comment by Githook User [ 15/Nov/23 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-82305 dbCheck should ignore prepare conflicts on secondaries
Branch: master
https://github.com/mongodb/mongo/commit/6007ed70a67624d75509eca3d5adf7aee3d03cc9

Comment by Louis Williams [ 06/Nov/23 ]

I actually think it is correct to ignore prepare conflicts on the secondary. Consider the following interleaving of events that is currently leading to a problem:

dbCheck Transaction
Scans _id from A to Z without hitting any prepare conflicts  
  Inserts doc with _id B and prepares the transaction
dbCheck completes and replicates  
Secondary tries to scan, but B is in a prepared state, hits prepare conflict  

In this case, we must ignore prepare conflicts on the secondary. Successfully replicating a dbCheck oplog entry guarantees that the range scanned on the primary at a specific point in time does not represent any documents in a prepared state. Therefore, for correctness, the secondary must ignore prepared updates when reading at the same point in time.

Comment by Louis Williams [ 01/Nov/23 ]

I think there are two ways to solve this problem:
1. Throw an exception when we dbCheck hits a prepare conflict on secondaries
2. Use WT bounded cursors so that we don't ever hit prepare conflicts on keys outside the range being hashed.

I actually think solution 2 fits in better with how we want the system to behave in the future, as this would allow us to stop using ignore_prepare=force everywhere else. Solution 2, however, cannot be backported to 6.0. Considering that 6.0 is broken right now, I'm reverting SERVER-74793 on 6.0 to stabilize the release until we work around the problem on master and 7.0

Comment by Sean Zimmerman [ 30/Oct/23 ]

The backport of SERVER-74793 to 6.0 caused BF-30418, we decided to assign this to storage execution since the BF was caused by storage execution work and the proposed solution involves adjusting PrepareConflictBehavior which is owned by storage execution even if this issue applies to dbCheck

Comment by Sean Zimmerman [ 18/Oct/23 ]

cc: xuerui.fa@mongodb.com louis.williams@mongodb.com 

Generated at Thu Feb 08 06:48:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.