[SERVER-35435] Renaming during replication recovery incorrectly allows size adjustments Created: 06/Jun/18  Updated: 29/Oct/23  Resolved: 12/Jun/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 4.0.0-rc6, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File renameMiscount.tgz    
Issue Links:
Backports
Related
related to SERVER-35483 rollback makes config.transactions fa... Closed
is related to SERVER-34976 clear the "needing size adjustment" s... Closed
is related to SERVER-34977 subtract capped deletes from fastcoun... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-06-18
Participants:

 Description   

Keeping correct counts with the WiredTigerSizeStorer is complex, error prone and seemingly impossible. Particularly with the move to only checkpointing stable data colliding with crashes, rollback and deletions due to capped collections. Thanks to geert.bosch's recent dive into other size storer related problems, I am proud to now announce collection renames (within the same database) are added to the list of operations that will require careful handling of minutiae to maintain correct-er counts.

One common scenario the WiredTiger integration layer attempts to keep correct is coming back online after a clean shutdown at an arbitrary stable timestamp. The state of (non-empty) collections and their sizes is that the size storer table contains the correct size after replication replays from the stable timestamp to the top of oplog (where the node left off when shutting down).

To do this, the code refrains from updating counts when in replication recovery (among some other conditions).

One exception to this rule is when a collection is created during replication recovery. This condition is unfortunately necessary because the WTSizeStorer maps "idents" to counts. When a collection is recreated during replication recovery, a new ident is chosen (the previous one is lost to the void). Because the previous mapping, albeit correct, is lost, the code must count inserts coming in to be correct.

The intersection of these behaviors along with renameCollection's behavior to create a new record store object (referencing the same underlying table) will juke the WTRecordStore constructor into allowing size adjustments during replication recovery on the same underlying ident.

Thus a sequence involving a rename from A -> B that manifests as an incorrect count:

  1. At shutdown collection B has 2 documents and a correct count of 2.
  2. At the stable timestamp, Collection A exists with 1 document and a count of 2.
  3. Replication recovery plays a rename from A -> B. This marks the collection for size adjustment.
  4. Replication recovery inserts a second document into B. This increases the count from 2 -> 3.

The attached data files, when brought up as a replica set (on localhost:27017), will demonstrate count() != itcount()

Note that replication recovery replaying a sequence of:

  1. create collection A
  2. insert
  3. rename A -> B
  4. insert

must allow size adjustments on B. As if it's being "inherited" from A.



 Comments   
Comment by Githook User [ 12/Jun/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-34977 SERVER-35435 SERVER-34976 Fix fastcounts for RTT, including capped collections.

(cherry picked from commit 8b698cac2d19f0fec502db10501e7059a10d2897)
Branch: v4.0
https://github.com/mongodb/mongo/commit/0fd5d4eb2e61bbef14f6e55c8e5f9619e807260b

Comment by Githook User [ 12/Jun/18 ]

Author:

{'username': 'judahschvimer', 'name': 'Judah Schvimer', 'email': 'judah@mongodb.com'}

Message: SERVER-34977 SERVER-35435 SERVER-34976 Fix fastcounts for RTT, including capped collections.
Branch: master
https://github.com/mongodb/mongo/commit/8b698cac2d19f0fec502db10501e7059a10d2897

Comment by Judah Schvimer [ 06/Jun/18 ]

/**
 */
(function() {
    "use strict";
 
    load("jstests/replsets/libs/rollback_test.js");
 
    const testName = "rollback_rename_count";
    const dbName = testName;
 
    let replSet = new ReplSetTest({name: testName, nodes: 3, useBridge: true});
    replSet.startSet();
 
    const nodes = replSet.nodeList();
    replSet.initiate({
        _id: testName,
        members: [
            {_id: 0, host: nodes[0]},
            {_id: 1, host: nodes[1]},
            {_id: 2, host: nodes[2], arbiterOnly: true}
        ]
    });
 
    // Set up Rollback Test.
    let rollbackTest = new RollbackTest(testName, replSet);
    let primary = rollbackTest.getPrimary();
    assert.commandWorked(primary.setLogLevel(3, 'storage.recovery'));
    let testDb = primary.getDB(dbName);
 
    const collName1 = "fromCollName1";
    const otherName1 = "toCollName1";
    let commonColl1 = testDb.getCollection(collName1);
    assert.commandWorked(commonColl1.insert({a: 1}));
 
    replSet.awaitLastOpCommitted();
    assert.commandWorked(
        primary.adminCommand({configureFailPoint: 'disableSnapshotting', mode: 'alwaysOn'}));
 
    assert.commandWorked(commonColl1.renameCollection(otherName1));
    commonColl1 = testDb.getCollection(otherName1);
    assert.commandWorked(commonColl1.insert({b: 1}));
 
    const collName2 = "fromCollName2";
    const otherName2 = "toCollName2";
    let commonColl2 = testDb.getCollection(collName2);
    assert.commandWorked(commonColl2.insert({c: 1}));
    assert.commandWorked(commonColl2.renameCollection(otherName2));
    commonColl2 = testDb.getCollection(otherName2);
    assert.commandWorked(commonColl2.insert({d: 1}));
 
    rollbackTest.transitionToRollbackOperations();
    rollbackTest.transitionToSyncSourceOperationsBeforeRollback();
    rollbackTest.transitionToSyncSourceOperationsDuringRollback();
    try {
        rollbackTest.transitionToSteadyStateOperations();
    } finally {
        assert.commandWorked(
            primary.adminCommand({configureFailPoint: 'disableSnapshotting', mode: 'off'}));
    }
 
    rollbackTest.stop();
})();

Comment by Daniel Gottlieb (Inactive) [ 06/Jun/18 ]

Assigning to judah.schvimer (and I put it in the current sprint, apologies if that's inappropriate) as he's helping manage some other tickets describing different ways counts can go wrong (mostly with capped collections). The solutions are not obviously independent, so it makes sense to avoid concurrent development in this area.

Generated at Thu Feb 08 04:39:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.