[SERVER-72422] FCBIS may never truncate the oplog Created: 29/Dec/22  Updated: 29/Oct/23  Resolved: 06/Jan/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.2.0, 6.0.0, 6.1.0, 6.2.0-rc4
Fix Version/s: 6.0.4, 6.2.0-rc5, 6.3.0-rc0

Type: Bug Priority: Critical - P2
Reporter: Louis Williams Assignee: Matthew Russotto
Resolution: Fixed Votes: 1
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-72525 Randomly choose between FCBIS and log... Open
is related to SERVER-72423 FCBIS will never delete drop-pending ... Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.2, v6.0
Sprint: Repl 2023-01-09
Participants:

 Description   

We install a callback inside the storage engine to determine how much oplog to keep for uncommitted transactions.

Upon completion, FCBIS reconstructs a new storage engine, but it does not re-instantiate this callback. When this callback is not present, we default to not truncating the oplog.

As a result, we will never truncate the oplog after completing a FCBIS. Restarting the node fixes this problem permanently.



 Comments   
Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-72422 FCBIS may never truncate the oplog (tests)
Branch: v6.2
https://github.com/mongodb/mongo/commit/718bf390c8830b2d7a8d7e306617b525f7d140f7

Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-72422 FCBIS may never truncate the oplog
Branch: v6.2
https://github.com/10gen/mongo-enterprise-modules/commit/be23987c7a365f6edbdba449b38551727d105868

Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-72422 FCBIS may never truncate the oplog (tests)
Branch: v6.0
https://github.com/mongodb/mongo/commit/e8d1267f2c840b322207b12230fa62493f0a6801

Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-72422 FCBIS may never truncate the oplog
Branch: v6.0
https://github.com/10gen/mongo-enterprise-modules/commit/93e1913dd8223e4a5a3b15e7682ef9ff1cb5f3ea

Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-72422 FCBIS may never truncate the oplog (tests)
Branch: master
https://github.com/mongodb/mongo/commit/c8ed1cb8119aedaf03fbd79f2282af975ee2426c

Comment by Githook User [ 06/Jan/23 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-72422 FCBIS may never truncate the oplog
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/3596b94a1ee1ffaea051f20c56332e3706291262

Comment by Kelsey Schubert [ 04/Jan/23 ]

Yes, I think this warrants a backport to 6.2.

Comment by Andy Schwerin [ 04/Jan/23 ]

Does this need a 6.2 backport if we do a 6.0 backport? I can't remember the rules, or if they matter for this bug.

Comment by Louis Williams [ 29/Dec/22 ]

Reproducer (enterprise-only):

(function() {
"use strict";
 
const rst = new ReplSetTest({
    nodeOptions:
        {syncdelay: 3, oplogSize: 10, setParameter: {minSnapshotHistoryWindowInSeconds: 5}},
    nodes: [{}, {rsConfig: {priority: 0}}],
});
rst.startSet();
rst.initiate();
 
const primary = rst.getPrimary();
const primaryDB = primary.getDB("testDB");
const secondary = rst.getSecondary();
const collName = "testColl";
const primaryColl = primaryDB[collName];
 
assert.commandWorked(primaryColl.insert({_id: "a"}, {writeConcern: {w: 2}}));
jsTestLog("Creating initial sync node.");
 
const initialSyncNode = rst.add({
    rsConfig: {priority: 0},
    syncdelay: 3,
    oplogSize: 10,
    setParameter: {
        minSnapshotHistoryWindowInSeconds: 5,
        'initialSyncMethod': 'fileCopyBased',
        'initialSyncSourceReadPreference': 'secondaryPreferred',
        'logComponentVerbosity':
            tojson({storage: {recovery: 2}, replication: {verbosity: 1, initialSync: 2}}),
    }
});
rst.reInitiate();
 
rst.awaitSecondaryNodes();
jsTestLog("inserting data to increase size of oplog");
for (let i = 0; i < 1000; i++) {
    let bulk = primaryDB.coll.initializeUnorderedBulkOp();
    // 100 * 100KB = 10MB.
    for (let j = 0; j < 100; j++) {
        bulk.insert({a: 'x'.repeat(100 * 1024)});
    }
    assert.commandWorked(bulk.execute());
    rst.awaitSecondaryNodes();
 
    print("-- oplog size on nodes", "--", new Date());
    print("prim", primary.getDB('local').oplog.rs.stats().size);
    print("sec ", secondary.getDB('local').oplog.rs.stats().size);
    print("new ", initialSyncNode.getDB('local').oplog.rs.stats().size);
    sleep(3 * 1000);
}
 
rst.stopSet();
})();

observe that the "new" node's oplog never shrinks

Generated at Thu Feb 08 06:21:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.