[SERVER-42484] May not be inside required WriteUnitOfWork when writing multikey index keys during initial sync data cloning Created: 29/Jul/19  Updated: 29/Oct/23  Resolved: 25/Sep/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.2.0-rc4
Fix Version/s: 4.2.1, 4.3.1, 4.0.15

Type: Bug Priority: Major - P3
Reporter: William Schultz (Inactive) Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File initial_sync_multikey.js    
Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-43954 Replica set sync failure with Invaria... Closed
Related
related to SERVER-41529 To prevent dangling index records, Co... Closed
is related to SERVER-44575 mongod crashes during initial replica... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Steps To Reproduce:

(function() {
    "use strict";
 
    // Set up replica set.
    var testName = "repro";
    var dbName = testName;
    var replTest = new ReplSetTest({name: testName, nodes: 1});
    replTest.startSet();
    replTest.initiate();
 
    var primary = replTest.getPrimary();
    var primaryDB = primary.getDB(dbName);
    var collName = "multikey_coll";
 
    jsTestLog("Creating collection and index.");
    assert.commandWorked(primaryDB.createCollection(collName));
    assert.commandWorked(primaryDB[collName].createIndex({"x": 1}, {background: true}));
 
    // Make multikey with some docs.
    primaryDB[collName].insert({x: [1, 2]});
    primaryDB[collName].insert({x: [1, 2]});
    primaryDB[collName].insert({x: [1, 2]});
    assert.commandWorked(primaryDB.adminCommand({setFeatureCompatibilityVersion: "4.0"}));
 
    // Add a SECONDARY node. It should use batchSize=2 for its initial sync queries.
    jsTestLog("Adding secondary node.");
    replTest.add();
 
    // Let the SECONDARY begin initial sync.
    jsTestLog("Re-initiating replica set with new secondary.");
    replTest.reInitiate();
 
    // Wait until initial sync completes.
    replTest.awaitSecondaryNodes();
    replTest.awaitReplication();
    replTest.stopSet();
})();

Sprint: Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07
Participants:
Case:
Linked BF Score: 0

 Description   

During initial sync collection cloning, we insert documents via the CollectionBulkLoader, which also inserts index keys. When we insert keys into an external sorter here, we are not inside a WriteUnitOfWork. Normally this is safe, since we will only be doing in-memory writes through the BulkBuilder, but if we are in FCV=4.0 and the index build is done using the Background method, then we will not use the bulk builder, and instead go through the IndexAccessMethod, which may in fact write to the storage layer if we are updating a multikey index. We need to make sure that we are enclosing these writes inside a WUOW in case they end up writing to storage.



 Comments   
Comment by Githook User [ 13/Jan/20 ]

Author:

{'name': 'William Schultz', 'email': 'william.schultz@mongodb.com', 'username': 'will62794'}

Message: SERVER-42484 Add test to ensure we are inside a WriteUnitOfWork when writing index keys during initial sync collection cloning

(cherry picked from commit c41af3e315a115b04a53706837d36a729d174fc3)
Branch: v4.0
https://github.com/mongodb/mongo/commit/ffe43eabfb262487b1ccc0403138744e361d8c93

Comment by Githook User [ 01/Oct/19 ]

Author:

{'name': 'William Schultz', 'username': 'will62794', 'email': 'william.schultz@mongodb.com'}

Message: SERVER-42484 Ensure we are inside a WriteUnitOfWork when writing index keys during initial sync collection cloning

(cherry picked from commit c41af3e315a115b04a53706837d36a729d174fc3)
Branch: v4.2
https://github.com/mongodb/mongo/commit/ffc20cf1b310ebc22bab64f9627f558b78d58d57

Comment by Githook User [ 25/Sep/19 ]

Author:

{'username': 'will62794', 'email': 'william.schultz@mongodb.com', 'name': 'William Schultz'}

Message: SERVER-42484 Ensure we are inside a WriteUnitOfWork when writing index keys during initial sync collection cloning
Branch: master
https://github.com/mongodb/mongo/commit/c41af3e315a115b04a53706837d36a729d174fc3

Comment by William Schultz (Inactive) [ 24/Sep/19 ]

For reference, this bug does also appear on master if you explicitly set enableHybridIndexBuilds=false. This is the repro: initial_sync_multikey.js.

Comment by Judah Schvimer [ 24/Sep/19 ]

In 4.2+ case 1 is all that applies. enableHybridIndexBuilds is true by default in 4.2+, so by default, in 4.2+ we always use WT bulk cursor. I don't think this is something we would want to backport though.

Comment by Suganthi Mani [ 23/Sep/19 ]

Few thoughts from the observation for index building (irrespective of multi-index keys) during initial sync.

case 1 - If it is FCV 4.2+ & mongod binary 4.2+ , then it first checks the enableHybridIndexBuilds (start-up/runtime server parameter). If enableHybridIndexBuilds is false, then it uses the value set in "background" param in the index spec received from the sync source.
case 2 - If it is FCV 4.0 & mongod binary 4.2, then it uses the value set in "background" param in the index spec received from the sync source since MultiIndexBlock::areHybridIndexBuildsEnabled always return false.
case 3 - If it is mongod binary 4.0, then it ignores "background" param in the index spec received from the sync source and always uses WiredTiger bulk cursor due to foreground building (see this comment).

So, for case 1 & case 2, we have a possibility of running background index build. This results in not using the WT bulk cursor. As, we currently use WT bulk cursor only for foreground and hybrid index building. But, I feel during index building during initial sync should always use WT bulk cursor (i.e foreground or hybrid) for better performance. milkie any thoughts.

Comment by William Schultz (Inactive) [ 10/Sep/19 ]

The contract here seems a bit unclear, since the MultiIndexBlock::insert method claims that it must be called inside a WriteUnitOfWork, but the changes from SERVER-41529 made it so that we explicitly don't call it inside a WUOW from the CollectionBulkLoaderImpl::_addDocumentToIndexBlocks method. We can presumably wrap the call to _addDocumentToIndexBlocks in a WriteUnitOfWork, but this may run into similar atomicity issues to those raised in SERVER-41529 i.e. if a WriteConflict is thrown inside the transaction, some of the in-memory index key updates may have already been completed, but the storage write to update the multikey flag would be rolled back.

Comment by Suganthi Mani [ 01/Aug/19 ]

william.schultz It seems like its not problem on 4.0 because MultiIndexBlockImpl constructor sets _buildInBackground to false. As a result,  MultiIndexBlockImpl::init will disable background index building even if index spec contains the "background" field as true. So, it seems we always do bulk index building for database cloning phase of initial sync.
CC milkie louis.williams allison.easton

Comment by Kelsey Schubert [ 30/Jul/19 ]

My inclination is towards stability of the GA release and to let this wait until 4.2.1.

Comment by Judah Schvimer [ 30/Jul/19 ]

We may only need to do this on 4.2 and earlier. Doing it on master is an open question.

kelsey.schubert, in FCV 4.0 doing a multikey index build during initial sync data cloning will invariant. Should we do this before 4.2.0, or wait for 4.2.1?

Comment by William Schultz (Inactive) [ 29/Jul/19 ]

This may also be an issue on 4.0 after BACKPORT-4730 is completed.

Generated at Thu Feb 08 05:00:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.