[SERVER-47022] Oplog application mode must be set to kRecovering when applying oplog entries after rollbackViaRefetch Created: 20/Mar/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication
Affects Version/s: 4.5.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by SERVER-21700 Do not relax constraints during stead... Closed
Related
is related to SERVER-47053 Relax oplog application constraints i... Closed
Assigned Teams:
Replication
Operating System: ALL
Steps To Reproduce:

// Run with resmoke parameters: --majorityReadConcern=off --suites=replica_sets
(function() {
'use strict';
 
load("jstests/replsets/libs/rollback_test.js");
 
let dbName = "rollback_constraint";
let collName = "sourceColl";
 
let doc1 = {_id: 0};
 
let CommonOps = (node) => {
    // Insert an initial dummy document.
    assert.commandWorked(node.getDB(dbName)[collName].insert({}));
};
 
let RollbackOps = (node) => {
    // Let the doc be refetched
    assert.commandWorked(node.getDB(dbName)[collName].insert(doc1));
};
 
let SyncSourceOps = (node) => {
    // Insert same doc on different branch so it will be refetched and applied on rollback node.
    assert.commandWorked(node.getDB(dbName)[collName].insert(doc1));
};
 
// Set up Rollback Test.
let rollbackTest = new RollbackTest();
CommonOps(rollbackTest.getPrimary());
 
let rollbackNode = rollbackTest.transitionToRollbackOperations();
RollbackOps(rollbackNode);
 
let syncSourceNode = rollbackTest.transitionToSyncSourceOperationsBeforeRollback();
SyncSourceOps(syncSourceNode);
 
// Wait for rollback to finish.
rollbackTest.transitionToSyncSourceOperationsDuringRollback();
rollbackTest.transitionToSteadyStateOperations();
 
// Check the replica set.
rollbackTest.stop();
}());

Participants:
Linked BF Score: 50

 Description   

When we come out of rollbackViaRefetch, we may remain in RECOVERING state until we have applied oplog entries up to our minValid optime. Our current state post rollback may not be consistent with the oplog, so we must relax constraints during oplog application. Currently, the OplogApplier used for oplog application in BackgroundSync always runs in application mode kSecondary. We need to make sure it uses mode kRecovering when applying ops during RECOVERING post rollback since now we enforce constraints whenever we are applying in mode kSecondary (SERVER-21700).



 Comments   
Comment by Matthew Russotto [ 02/Oct/20 ]

BACKPORT-6289 is cancelled so this only affects 4.7+

 

 

Comment by Ali Mir [ 08/Jun/20 ]

We've added another TODO related to this ticket here.

Comment by William Schultz (Inactive) [ 24/Mar/20 ]

If we do this ticket, we should complete this TODO.

Comment by William Schultz (Inactive) [ 20/Mar/20 ]

Since the oplog applier is owned by ReplCoordExternalState, it might be hard to construct a new OplogApplier with different options specifically for applying oplog entries in RECOVERING state. One way to fix this bug might be to change the options on the existing applier to have the correct application mode, and then unset them after we exit RECOVERING. We might be able to set the applier's _options.mode to kRecovering after exiting rollbackViaRefetch here, and reset the application mode to kSecondary when we know we have reached minValid at this point in the oplog applier. Not sure if this is the best solution, but they're some initial thoughts.

Comment by William Schultz (Inactive) [ 20/Mar/20 ]

This will also affect v4.4 pending completion of BACKPORT-6289.

Generated at Thu Feb 08 05:13:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.