[SERVER-30927] Use readConcern afterClusterTime for initsync oplog queries Created: 01/Sep/17  Updated: 30/Oct/23  Resolved: 07/Sep/17

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 3.4.10, 3.5.13

Type: Bug Priority: Major - P3
Reporter: Geert Bosch Assignee: Geert Bosch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-33812 First initial sync oplog read batch f... Closed
related to SERVER-30724 Initial sync might miss ops that were... Closed
related to SERVER-37408 Add afterClusterTime to initial sync ... Closed
related to SERVER-39607 MongoDB 3.4 should unconditionally us... Closed
is related to SERVER-30977 Need to sign cluster times for unshar... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Steps To Reproduce:

(function() {
    'use strict';
 
    load('jstests/replsets/rslib.js');
    const basename = 'initial_sync_visibility';
 
    jsTestLog('Bring up set');
    const rst = new ReplSetTest({name: basename, nodes: 1});
    rst.startSet();
    rst.initiate();
 
    const primary = rst.getPrimary();
    const primaryDB = primary.getDB(basename);
 
    jsTestLog('Create a collection');
    assert.writeOK(primaryDB['coll'].save({_id: "visible"}));
    jsTestLog('Make sure synced');
    rst.awaitReplication();
 
    jsTestLog('Activate WT visibility failpoint and write an invisible document');
    assert.commandWorked(primaryDB.adminCommand(
        {configureFailPoint: 'WTPausePrimaryOplogDurabilityLoop', mode: 'alwaysOn'}));
    assert.writeOK(primaryDB['coll'].save({_id: "invisible"}));
 
    jsTestLog('Bring up a new node');
    const secondary = rst.add({setParameter: 'numInitialSyncAttempts=3'});
    rst.reInitiate();
    assert.eq(primary, rst.getPrimary(), 'Primary changed after reconfig');
 
    jsTestLog('Wait for new node to start cloning');
    secondary.setSlaveOk();
    const secondaryDB = secondary.getDB(basename);
    wait(function() {
        return secondaryDB.stats().collections >= 1;
    }, 'never saw new node starting to clone, was waiting for collections in: ' + basename);
 
    jsTestLog('Disable WT visibility failpoint on primary making all visible.');
    assert.commandWorked(primaryDB.adminCommand(
        {configureFailPoint: 'WTPausePrimaryOplogDurabilityLoop', mode: 'off'}));
 
    jsTestLog('Wait for both nodes to be up-to-date');
    rst.awaitSecondaryNodes();
    rst.awaitReplication();
 
    jsTestLog('Check all OK');
    rst.checkReplicatedDataHashes();
    rst.stopSet(15);
})();

Sprint: Storage 2017-09-11
Participants:
Linked BF Score: 0

 Description   

This is needed to ensure visibility of the oplog entries that are queried.



 Comments   
Comment by Githook User [ 10/Oct/17 ]

Author:

{'email': 'geert@mongodb.com', 'name': 'Geert Bosch', 'username': 'GeertBosch'}

Message: SERVER-30927 Use readConcern afterOpTime for initsync oplog queries

(cherry picked from commit f9438895085bf788a7d7276221b5efd865391df7)

Conflicts:
src/mongo/db/repl/oplog_fetcher.cpp
Branch: v3.4
https://github.com/mongodb/mongo/commit/ec5486e08cef499d894babaa2be7e4260bb601bb

Comment by Ramon Fernandez Marina [ 07/Sep/17 ]

Author:

{'username': u'GeertBosch', 'name': u'Geert Bosch', 'email': u'geert@mongodb.com'}

Message:SERVER-30927 Add comment about removing workaround
Branch:master
https://github.com/mongodb/mongo/commit/f9438895085bf788a7d7276221b5efd865391df7

Comment by Githook User [ 07/Sep/17 ]

Author:

{'username': 'GeertBosch', 'name': 'Geert Bosch', 'email': 'geert@mongodb.com'}

Message: SERVER-30927 Use readConcern afterClusterTime for initsync oplog queries
Branch: master
https://github.com/mongodb/mongo/commit/3033595cab33add1a9f29896d2717860fc9876b6

Comment by Benety Goh [ 07/Sep/17 ]

The term comparison in the proposed patch may be removed after SERVER-30977 is fixed.

Comment by Andy Schwerin [ 02/Sep/17 ]

I see that the patch proposed covers the case I mentioned earlier. Thx.

Comment by Andy Schwerin [ 02/Sep/17 ]

Any change here will need to work when initial syncing a 3.6 node from a 3.4 node. I'm not sure how good the mixed version coverage is, so double check that.

Comment by Geert Bosch [ 01/Sep/17 ]

This fixes a specific issue that resulted from the changes in oplog visibility code. SERVER-30724 seems broader than that, however as I have neither a reproducer or certainty that the problem doesn't exist, I'd like to leave that open.

Comment by Spencer Brody (Inactive) [ 01/Sep/17 ]

Dupe of SERVER-30724?

Generated at Thu Feb 08 04:25:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.