[SERVER-40517] Fastcount isn't adjusted correctly when transaction started and prepared is entirely rolled back Created: 06/Apr/19  Updated: 29/Oct/23  Resolved: 12/Apr/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.11

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: prepare_durability, rbfz, txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-40482 incorrect fastcount for a majority co... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

python buildscripts/resmoke.py --suites=no_server repro_server40517.js

repro_server40517.js

(function() {
    "use strict";
 
    load("jstests/core/txns/libs/prepare_helpers.js");
    load("jstests/replsets/libs/rollback_test.js");
 
    const rollbackTest = new RollbackTest();
    const rollbackNode = rollbackTest.getPrimary();
 
    // The makeDocs() function makes it easier to have transactions insert varying numbers of
    // documents in order to keep all possible sums of count adjustments as distinct values.
    let nextId = 0;
    const makeDocs = (numDocs) => {
        return Array.from({length: numDocs}, () => ({_id: ++nextId}));
    };
 
    const testDB = rollbackNode.getDB("test");
    assert.commandWorked(testDB.mycoll.insert(makeDocs(2)));
 
    rollbackTest.transitionToRollbackOperations();
 
    // We perform some operations on the "test.mycoll" collection aside from starting and preparing
    // a transaction in order to cause the count diff computed by replication to be non-zero.
    assert.commandWorked(testDB.mycoll.insert(makeDocs(5)));
 
    const session = rollbackNode.startSession({causalConsistency: false});
    const sessionDB = session.getDatabase(testDB.getName());
 
    session.startTransaction();
    assert.commandWorked(sessionDB.mycoll.insert(makeDocs(10)));
    PrepareHelpers.prepareTransaction(session);
 
    rollbackTest.transitionToSyncSourceOperationsBeforeRollback();
 
    // XXX: Workaround for SERVER-40322.
    assert.commandWorked(rollbackNode.adminCommand(
        {setParameter: 1, createRollbackDataFiles: false}));
 
    rollbackTest.transitionToSyncSourceOperationsDuringRollback();
    rollbackTest.transitionToSteadyStateOperations();
 
    rollbackTest.stop();
})();

Sprint: Storage NYC 2019-04-22
Participants:

 Description   

Stashing the record store counts in memory via the call to _findRecordStoreCounts() is done prior to aborting the storage transaction underlying a prepared transaction. This causes the call to _correctRecordStoreCounts() to restore the counts to a version including the effects of prepared transctions that weren't majority-committed.

[js_test:repro_server40517] 2019-04-06T10:12:58.355-0400 d20030| 2019-04-06T10:12:58.354-0400 D2 ROLLBACK [rsBackgroundSync] Record count of test.mycoll (2549dc22-8ef8-419a-b800-d565111894d7) before rollback is 17. Setting it to 12, due to change of -5



 Comments   
Comment by Githook User [ 12/Apr/19 ]

Author:

{'name': 'Louis Williams', 'username': 'louiswilliams', 'email': 'louis.williams@mongodb.com'}

Message: SERVER-40482 SERVER-40517 Fix fastcount algorithm for rollback of prepared transactions

This fixes two bugs, both related the correctness of the algorithm for adjusting collection counts
during rollback. The first bug is that rolled-back non-majority confirmed "prepare" oplog entries
may rollback and incorrectly adjust collection fastcounts. The second bug is that a prepared and
committed transaction will have incorrect collection counts after rollback.

The new high-level order of operations during replication rollback are as follows:
1. Abort all active prepared transactions, rolling back any in-memory counts
2. Calculate collection count adjustments by scanning rolled-back oplog entries
3. If a 'commitTransaction' oplog entry is rolled-back, find the associated 'prepare' to calculate
size adjustments
4. Rollback to the stable timestamp. Replay oplog to common point. This makes no collection count
adjustments.
5. Set collection counts to previously calculated values
6. Reconstruct prepared transactions, which updates in-memory fastcounts
Branch: master
https://github.com/mongodb/mongo/commit/8c2ef8757dfe625e7fc06c3cdef6b7692764d00c

Comment by Judah Schvimer [ 08/Apr/19 ]

I think we need to holistically fix fast-count such that the rollback fuzzer is able to unblacklist its support of fast-count as part of or in concert with the fix. I think it will be difficult to fix all fast-count problems by addressing them individually.

Note that any fix for this that moves around the ordering in rollback_impl.cpp will need to coordinate with SERVER-40322 which will also likely be moving around the ordering there.

Generated at Thu Feb 08 04:55:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.