ISSUE SUMMARY
During chunk migrations, insert and update operations affecting data within a migrating chunk are not reflected to the recipient shard, resulting in data loss.
USER IMPACT
Only the following deployments are affected by this issue:
- Sharded clusters where shards run MongoDB versions 3.0.9 or 3.0.10, and
- The balancer is enabled or manual chunk migrations are performed
Standalone nodes, replica set deployments, and sharded clusters with no chunk migrations are not impacted by this issue. No other version of MongoDB is affected.
During a chunk migration, insert and update operations affecting documents in the migrating chunk are not reflected in the recipient shard, leading to data loss.
Users who haven’t disabled the moveParanoia option should be able to recover this data manually.
WORKAROUNDS
Neither MongoDB 3.2 nor MongoDB 3.0.8 and earlier are affected by this issue. Users on affected versions should upgrade to 3.0.11 or newer, 3.2.4 or newer as soon as possible.
Alternatively, users should disable the balancer and ensure no manual chunk migrations occur in order to avoid this issue. The balancer can be disabled cluster-wide or on a per-collection basis. See the Documentation section below for more information.
AFFECTED VERSIONS
MongoDB versions 3.0.9 and 3.0.10, only.
FIX VERSION
The fix is included in the 3.0.11 production release.
DOCUMENTATION
Original description
Similar to SERVER-22535, if I insert documents while a migration is happening, those documents seem to get lost.
The script below inserts 20000 documents into a collection. Then manually moves a chunk while inserting another 20000 documents. The end asserts that there should be 40000 documents in the collection, but in my testing there are 20-30 documents missing.
var LOG_FUNCTION = "function log(msg) {var date = new Date(); jsTest.log('MONGOISSUE - ' + date.getHours() + ':' + date.getMinutes() + ':' + date.getSeconds() + ':' + date.getMilliseconds() + ' - ' + msg);}"; eval(LOG_FUNCTION); var numDocs = 20000; // Set up cluster. log('SETTING UP CLUSTER...'); var st = new ShardingTest({shards: 2, other: {shardOptions: {storageEngine: 'mmapv1', verbose: 0}}}); var s = st.s0; var d1 = st.shard1; var coll = s.getDB("test").foo; assert.commandWorked(s.adminCommand({enableSharding: coll.getDB().getName()})); assert.commandWorked(s.adminCommand({shardCollection: coll.getFullName(), key: {_id: "hashed"}})); log('INSERT START'); for (i=0; i<numDocs; i++) { coll.insert({_id: i}); } log('INSERT END'); assert.commandWorked(coll.ensureIndex({a: 1})); // Check document count. var count = coll.find().itcount(); log("DOC COUNT: " + count); assert.eq(numDocs, count); // Configure server to increase reproducibility. assert.commandWorked(d1.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 2})); function logChunk(chunk) { log('chunk ' + chunk['_id'] + ' shard ' + chunk['shard']); } st.config.chunks.find().forEach(logChunk); // Initiate migration and add data in parallel. shell = startParallelShell(LOG_FUNCTION + " log('INSERT START'); var coll = db.getSiblingDB('test').foo; for (i=" + numDocs + "; i<" + (numDocs * 2) + "; i++) { coll.insert({_id: i}); }; log('INSERT END');", s.port); sleep(500); log('MOVECHUNK START'); var res = s.adminCommand({moveChunk: coll.getFullName(), find: {_id: 0}, to: "shard0000", _waitForDelete: true}); log('MOVECHUNK END'); assert.commandWorked(res); st.config.chunks.find().forEach(logChunk); shell(); // Re-check document count. var count = coll.find().itcount(); log("DOC COUNT: " + count); assert.eq(numDocs * 2, count);
I've reproduced this both in mongo 3.0.9 and 3.0.10.
Update
MongoDB 3.2 is not affected by this bug
- related to
-
SERVER-22535 Some index operations (drop index, abort index build, update TTL config) on collection during active migration can cause migration to skip documents
- Closed