[SERVER-60160] Initial syncing node can crash due to BSONObjectTooLarge exception thrown while replaying the oplog entries. Created: 22/Sep/21  Updated: 06/Dec/22  Resolved: 27/Sep/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Backlog - Replication Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7515 idempotence violation when intermedia... Open
Related
related to SERVER-53777 Write idempotency targeted tests for ... Closed
Assigned Teams:
Replication
Operating System: ALL
Steps To Reproduce:

 
(function() {
"use strict";
 
load("jstests/libs/curop_helpers.js");  // for waitForCurOpByFailPoint().
load("jstests/libs/fail_point_util.js");
 
const testName = "testDB";
const dbName = testName;
const collName = "testcoll";
 
// Start a 3 node replica set to avoid primary step down after secondary restart.
const rst = new ReplSetTest(
    {nodes: [{}, {rsConfig: {priority: 0}}, {arbiter: true}], settings: {chainingAllowed: false}});
rst.startSet();
rst.initiate();
 
var primary = rst.getPrimary();
var primaryDB = primary.getDB(dbName);
var primaryAdmin = primary.getDB("admin");
var primaryColl = primaryDB[collName];
var secondary = rst.getSecondary();
 
// Write a document of 6 MB.
const largeArray = new Array(6 * 1024 * 1024).join('x');
assert.commandWorked(primaryColl.insert({_id: 1, a: largeArray}));
 
jsTestLog("Stopping secondary.");
rst.stop(secondary);
 
jsTestLog("Enabling failpoint 'hangBeforeListDatabases' on primary (sync source).");
assert.commandWorked(
    primary.adminCommand({configureFailPoint: "hangBeforeListDatabases", mode: "alwaysOn"}));
 
jsTestLog("Starting secondary.");
let secondaryStartupParams = {};
secondaryStartupParams['numInitialSyncAttempts'] = 1;
secondary = rst.start(secondary, {startClean: true, setParameter: secondaryStartupParams});
rst.waitForState(secondary, ReplSetTest.State.STARTUP_2);
 
jsTestLog("Waiting for primary to reach failPoint 'hangBeforeListDatabases'");
waitForCurOpByFailPoint(primaryAdmin,
                        new RegExp('^' +
                                   ""),
                        "hangBeforeListDatabases");
 
// Now perform following updates on the document. So, both collection cloner and initial sync
// oplog appplier would try to apply those updates and lead to BSONObjectTooLarge exception error.
assert.commandWorked(primaryColl.update({_id: 1}, {$set: {b: largeArray}}));
assert.commandWorked(primaryColl.update({_id: 1}, {$unset: {b: 1}}));
assert.commandWorked(primaryColl.update({_id: 1}, {$set: {c: largeArray}}));
 
jsTestLog("Allowing initial sync to continue.");
assert.commandWorked(
    primaryAdmin.adminCommand({configureFailPoint: 'hangBeforeListDatabases', mode: 'off'}));
 
jsTestLog("Waiting for initial sync to complete.");
rst.waitForState(secondary, ReplSetTest.State.SECONDARY);
 
rst.stopSet();
})();

Participants:

 Description   

This is a bug in logical initial sync. Since the cloner doesn't do a snapshot read on the sync source for data cloning, the initial syncing node replays oplog entries on an inconsistent data. This can lead to an idempotency issue that exists when applying the operations from a transaction after the data already reflects the transaction. We have OplogApplication::Mode::kInitialSync to absorb such kind of errors and silently ignore it. I think we missed handling for the scenario that I mentioned in "steps to reproduce" section.



 Comments   
Comment by Suganthi Mani [ 29/Sep/21 ]

alan.zheng Sounds good to me! But, just to be noted, the same initial sync logic is also used by the Serverless (phase 0) tenant migration code and we are planning to replace it with Serverless phase 2 who's target date is 08/01/2023.

Generated at Thu Feb 08 05:49:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.