[SERVER-23425] Inserts and updates during chunk migration get deleted in 3.0.9, 3.0.10 Created: 30/Mar/16  Updated: 04/Apr/16  Resolved: 31/Mar/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.9, 3.0.10
Fix Version/s: 3.0.11

Type: Bug Priority: Critical - P2
Reporter: David Andrade Assignee: Andy Schwerin
Resolution: Fixed Votes: 0
Labels: RF

Issue Links:
Depends
Related
related to SERVER-22535 Some index operations (drop index, ab... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   
Issue Status as of Mar 31, 2016

ISSUE SUMMARY
During chunk migrations, insert and update operations affecting data within a migrating chunk are not reflected to the recipient shard, resulting in data loss.

USER IMPACT
Only the following deployments are affected by this issue:

  • Sharded clusters where shards run MongoDB versions 3.0.9 or 3.0.10, and
  • The balancer is enabled or manual chunk migrations are performed

Standalone nodes, replica set deployments, and sharded clusters with no chunk migrations are not impacted by this issue. No other version of MongoDB is affected.

During a chunk migration, insert and update operations affecting documents in the migrating chunk are not reflected in the recipient shard, leading to data loss.

Users who haven’t disabled the moveParanoia option should be able to recover this data manually.

WORKAROUNDS
Neither MongoDB 3.2 nor MongoDB 3.0.8 and earlier are affected by this issue. Users on affected versions should upgrade to 3.0.11 or newer, 3.2.4 or newer as soon as possible.

Alternatively, users should disable the balancer and ensure no manual chunk migrations occur in order to avoid this issue. The balancer can be disabled cluster-wide or on a per-collection basis. See the Documentation section below for more information.

AFFECTED VERSIONS
MongoDB versions 3.0.9 and 3.0.10, only.

FIX VERSION
The fix is included in the 3.0.11 production release.

DOCUMENTATION

Original description

Similar to SERVER-22535, if I insert documents while a migration is happening, those documents seem to get lost.

The script below inserts 20000 documents into a collection. Then manually moves a chunk while inserting another 20000 documents. The end asserts that there should be 40000 documents in the collection, but in my testing there are 20-30 documents missing.

var LOG_FUNCTION = "function log(msg) {var date = new Date(); jsTest.log('MONGOISSUE - ' + date.getHours() + ':' + date.getMinutes() + ':' + date.getSeconds() + ':' + date.getMilliseconds() + ' - ' + msg);}";
eval(LOG_FUNCTION);
 
var numDocs = 20000;
 
// Set up cluster.
log('SETTING UP CLUSTER...');
var st = new ShardingTest({shards: 2, other: {shardOptions: {storageEngine: 'mmapv1', verbose: 0}}});
var s = st.s0;
var d1 = st.shard1;
var coll = s.getDB("test").foo;
assert.commandWorked(s.adminCommand({enableSharding: coll.getDB().getName()}));
assert.commandWorked(s.adminCommand({shardCollection: coll.getFullName(), key: {_id: "hashed"}}));
log('INSERT START');
for (i=0; i<numDocs; i++) {
    coll.insert({_id: i});
}
log('INSERT END');
assert.commandWorked(coll.ensureIndex({a: 1}));
 
// Check document count.
var count = coll.find().itcount();
log("DOC COUNT: " + count);
assert.eq(numDocs, count);
 
// Configure server to increase reproducibility.
assert.commandWorked(d1.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 2}));
 
function logChunk(chunk) { log('chunk ' + chunk['_id'] + ' shard ' + chunk['shard']); }
st.config.chunks.find().forEach(logChunk);
 
// Initiate migration and add data in parallel.
shell = startParallelShell(LOG_FUNCTION + " log('INSERT START'); var coll = db.getSiblingDB('test').foo; for (i=" + numDocs + "; i<" + (numDocs * 2) + "; i++) { coll.insert({_id: i}); }; log('INSERT END');", s.port);
sleep(500);
log('MOVECHUNK START');
var res = s.adminCommand({moveChunk: coll.getFullName(), find: {_id: 0}, to: "shard0000", _waitForDelete: true});
log('MOVECHUNK END');
assert.commandWorked(res);
st.config.chunks.find().forEach(logChunk);
shell();
 
// Re-check document count.
var count = coll.find().itcount();
log("DOC COUNT: " + count);
assert.eq(numDocs * 2, count);

I've reproduced this both in mongo 3.0.9 and 3.0.10.

Update

MongoDB 3.2 is not affected by this bug



 Comments   
Comment by Ramon Fernandez [ 30/Mar/16 ]

Thanks for the detailed reproducer David Andrade, we're investigating.

Comment by Andy Schwerin [ 30/Mar/16 ]

How important is it to adjust the yield iterations server parameter in order to make this reproduce?

Comment by David Andrade [ 30/Mar/16 ]

I was able to reproduce it even if i commented out that line.

Comment by Ramon Fernandez [ 30/Mar/16 ]

David Andrade, this is to let you know we've identified the source of the issue and are working on a fix. Please note that MongoDB 3.2 is not affected by this bug, so if this issue is critical for you you may want to consider upgrading to 3.2 (3.2.4 is the latest stable release at the time of this writing).

Thanks,
Ramón.

Comment by David Andrade [ 31/Mar/16 ]

Can you confirm what version of 3.0 this bug was introduced in?

Comment by Andy Schwerin [ 31/Mar/16 ]

This bug affects 3.0.9 and 3.0.10 only.

Comment by Githook User [ 31/Mar/16 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-23425 Correctly track inserts and deletes to migrating chunks.
Branch: v3.0.11
https://github.com/mongodb/mongo/commit/48f8b49dc30cc2485c6c1f3db31b723258fcbf39

Comment by Githook User [ 31/Mar/16 ]

Author:

{u'username': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@mongodb.com'}

Message: SERVER-23425 Correctly track inserts and deletes to migrating chunks.
Branch: v3.0
https://github.com/mongodb/mongo/commit/3ce338f6fc95322141bbf35f982513a831bb74ca

Comment by Githook User [ 01/Apr/16 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-23425 Port 3.2 sharding move chunk unit tests
Branch: v3.0
https://github.com/mongodb/mongo/commit/3edc84475b10154a76f268edb5e80ac6ca609411

Generated at Mon Dec 11 04:15:12 UTC 2017 using JIRA 7.2.10#72012-sha1:2651463a07e52d81c0fcf01da710ca333fcb42bc.