[SERVER-4781] replica set initial sync failure when update cannot be applied to a future version of an object received via clone Created: 26/Jan/12  Updated: 04/Jun/21  Resolved: 23/Oct/12

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 2.2.1, 2.3.0

Type: Bug Priority: Major - P3
Reporter: Aaron Staple Assignee: Alberto Lerner
Resolution: Done Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File SERVER-4781-Windows-repl13-log.txt    
Issue Links:
Depends
Related
related to SERVER-7515 idempotence violation when intermedia... Open
related to SERVER-4907 sparse indexes break replication of a... Closed
related to SERVER-7371 Fix corner case where an update mod i... Closed
related to SERVER-9419 initial_sync4.js failing on Windows D... Closed
related to SERVER-57444 specify write concerns for updates an... Closed
is related to SERVER-7186 Batched oplog application increases f... Closed
is related to SERVER-7407 Adjust the behavior of $unset for nes... Closed
Operating System: ALL
Participants:

 Description   

Some modifier operations may uassert when they cannot be applied to a selected document. It's possible for an update to uassert when applied to a future version of a document during the initial sync phase, and this aborts the initial sync. It appears that the initial sync is then retried but if new, bad updates arrive I imagine the initial sync failure could keep repeating

Here is a test I wrote for this. The first attempt at initial sync fails, but the test stops sending updates so when the initial sync is retried. As a result, the current test passes. So it's just good as a demonstration right now.

// Test update modifier uassert during initial sync. SERVER-4781
 
load("jstests/replsets/rslib.js");
basename = "jstests_initsync4";
 
print("1. Bring up set");
replTest = new ReplSetTest( {name: basename, nodes: 1} );
replTest.startSet();
replTest.initiate();
 
m = replTest.getMaster();
md = m.getDB("d");
mc = m.getDB("d")["c"];
 
print("2. Insert some data");
N = 500000;
for( i = 0; i < N; ++i ) {
    mc.save( {_id:i,a:{}} );
}
md.getLastError();
 
print("3. Make sure synced");
replTest.awaitReplication();
 
print("4. Bring up a new node");
ports = allocatePorts( 3 );
basePath = "/data/db/" + basename;
hostname = getHostName();
 
s = startMongodTest (ports[2], basename, false, {replSet : basename, oplogSize : 2} );
 
var config = replTest.getReplSetConfig();
config.version = 2;
config.members.push({_id:2, host:hostname+":"+ports[2]});
try {
    m.getDB("admin").runCommand({replSetReconfig:config});
}
catch(e) {
    print(e);
}
reconnect(s);
 
print("5. Wait for new node to start cloning");
 
s.setSlaveOk();
sc = s.getDB("d")["c"];
 
wait( function() { printjson( sc.stats() ); return sc.stats().count > 0; } );
 
print("6. Start updating documents on primary");
for( i = N-1; i >= N-10000; --i ) {
    // If the document is cloned as {a:1}, the {$set:{'a.b':1}} modifier will uassert.
    mc.update( {_id:i}, {$set:{'a.b':1}} );
    mc.update( {_id:i}, {$set:{a:1}} );    
}
 
print("7. Wait for new node to become SECONDARY");
wait(function() {
     var status = s.getDB("admin").runCommand({replSetGetStatus:1});
     printjson(status);
     return status.members &&
     (status.members[1].state == 2);
     });

Here is some relevant logging from the test

 m31002| Wed Jan 25 20:38:29 [rsSync] replSet initial oplog application from Aaron-Staples-MacBook-Pro.local:31000 starting at Jan 25 20:37:31:9f8 to Jan 25 20:38:07:214
 m31002| Wed Jan 25 20:38:29 [rsSync] replSet initial sync failing, error applying oplog : 10145 LEFT_SUBFIELD only supports Object: a not: 1
 m31002| Wed Jan 25 20:38:29 [rsSync] replSet applying oplog from Aaron-Staples-MacBook-Pro.local:31000 failed, trying again
 m31002| Wed Jan 25 20:38:29 [rsSync] replSet not trying to sync from Aaron-Staples-MacBook-Pro.local:31000, it is vetoed for 60 more seconds

Also I wrote a test for master slave mode and that seems to work ok

// Test update modifier uassert during initial sync. SERVER-4781
 
rt = new ReplTest( "repl13tests" );
 
m = rt.start( true );
mc = m.getDB( 'd' )[ 'c' ];
 
for( i = 0; i < 100000; ++i ) {
    mc.save( {_id:i,a:{}} );
}
m.getDB( 'd' ).getLastError();
 
s = rt.start( false );
sc = s.getDB( 'd' )[ 'c' ];
 
assert.soon( function() { printjson( sc.count() ); return sc.count() > 0; } );
 
for( i = 99999; i >= 90000; --i ) {
    mc.update( {_id:i}, {$set:{'a.b':1}} );
    mc.update( {_id:i}, {$set:{a:1}} );    
}
 
mc.save( {} )
assert.soon( function() { return sc.count() == 100001; } );
 
printjson( sc.findOne( {_id:99999} ) );
printjson( sc.findOne( {_id:90000} ) );

I can push the tests when bb code freeze is over.



 Comments   
Comment by auto [ 17/Oct/12 ]

Author:

{u'date': u'2012-10-16T19:13:54-07:00', u'email': u'alerner@10gen.com', u'name': u'Alberto Lerner'}

Message: SERVER-4781 Fix style issues.
Branch: v2.2
https://github.com/mongodb/mongo/commit/fa0c2577a3f2103910cd76d26ce325b655ac6bbc

Comment by auto [ 17/Oct/12 ]

Author:

{u'date': u'2012-10-16T19:13:54-07:00', u'email': u'alerner@10gen.com', u'name': u'Alberto Lerner'}

Message: SERVER-4781 Fix style issues.
Branch: master
https://github.com/mongodb/mongo/commit/823da97015a5bc950708ff5c565583cc0c6ea25b

Comment by auto [ 15/Oct/12 ]

Author:

{u'date': u'2012-10-15T14:17:29-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}

Message: SERVER-4781 If in initial replication, allow a non-strict mod applying policy.
Branch: v2.2
https://github.com/mongodb/mongo/commit/b19f46ea4b744f9281fb33e7f1491b8080aa879d

Comment by auto [ 15/Oct/12 ]

Author:

{u'date': u'2012-10-15T14:17:29-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}

Message: SERVER-4781 If in initial replication, allow a non-strict mod applying policy.
Branch: master
https://github.com/mongodb/mongo/commit/03eb192471e22597936599a01187c2119a9eefbe

Comment by auto [ 15/Oct/12 ]

Author:

{u'date': u'2012-10-15T07:19:02-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}

Message: Revert "SERVER-4781 Allowed setting over an object whose path is only partially constructed."

This reverts commit 77bf5afd2483098c11b0286d7882d15b9351fbee.
Branch: master
https://github.com/mongodb/mongo/commit/d9791cce55a3e011c39ff384ed678c097b831f16

Comment by auto [ 14/Oct/12 ]

Author:

{u'date': u'2012-10-14T12:29:57-07:00', u'name': u'Alberto Lerner', u'email': u'alerner@10gen.com'}

Message: SERVER-4781 Allowed setting over an object whose path is only partially constructed.
Branch: master
https://github.com/mongodb/mongo/commit/77bf5afd2483098c11b0286d7882d15b9351fbee

Comment by Tad Marshall [ 24/Apr/12 ]

Log of master/slave on Windows showing assertion failures running repl13.js

Comment by Aaron Staple [ 10/Feb/12 ]

Another of the many update validation assertions where this can occur is if we have updates like:

c.save( {} )
c.update( {}, {$push:{a:1}} )
c.update( {}, {$set:{a:null}} )

Since the push will be replicated as

update( {_id:x,a:null}, {$push:{a:1}} )
and this will trigger an update validation assertion if a is actually null.

Comment by Aaron Staple [ 07/Feb/12 ]

Hi dustin - just to confirm is the assertion happening during initial sync of a new slave?

Comment by dustin norlander [ 07/Feb/12 ]

GAH! We just got this in production. Slave cannot get past the assertion and it getting staler by the minute. What can we do?

Comment by auto [ 07/Feb/12 ]

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-4781 disable test
Branch: master
https://github.com/mongodb/mongo/commit/65e2ffc8d28e46148593512fbb976e970fbf2a52

Comment by auto [ 07/Feb/12 ]

Author:

{u'login': u'astaple', u'name': u'Aaron', u'email': u'aaron@10gen.com'}

Message: SERVER-4781 tests
Branch: master
https://github.com/mongodb/mongo/commit/5750a5b7a1b7bb5ae495534f9228d5773c28db4b

Generated at Thu Feb 08 03:06:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.