Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-23425

Inserts and updates during chunk migration get deleted in 3.0.9, 3.0.10

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.0.11
    • Affects Version/s: 3.0.9, 3.0.10
    • Component/s: Sharding
    • Labels:
    • Fully Compatible
    • ALL

      Issue Status as of Mar 31, 2016

      ISSUE SUMMARY
      During chunk migrations, insert and update operations affecting data within a migrating chunk are not reflected to the recipient shard, resulting in data loss.

      USER IMPACT
      Only the following deployments are affected by this issue:

      • Sharded clusters where shards run MongoDB versions 3.0.9 or 3.0.10, and
      • The balancer is enabled or manual chunk migrations are performed

      Standalone nodes, replica set deployments, and sharded clusters with no chunk migrations are not impacted by this issue. No other version of MongoDB is affected.

      During a chunk migration, insert and update operations affecting documents in the migrating chunk are not reflected in the recipient shard, leading to data loss.

      Users who haven’t disabled the moveParanoia option should be able to recover this data manually.

      WORKAROUNDS
      Neither MongoDB 3.2 nor MongoDB 3.0.8 and earlier are affected by this issue. Users on affected versions should upgrade to 3.0.11 or newer, 3.2.4 or newer as soon as possible.

      Alternatively, users should disable the balancer and ensure no manual chunk migrations occur in order to avoid this issue. The balancer can be disabled cluster-wide or on a per-collection basis. See the Documentation section below for more information.

      AFFECTED VERSIONS
      MongoDB versions 3.0.9 and 3.0.10, only.

      FIX VERSION
      The fix is included in the 3.0.11 production release.

      DOCUMENTATION

      Original description

      Similar to SERVER-22535, if I insert documents while a migration is happening, those documents seem to get lost.

      The script below inserts 20000 documents into a collection. Then manually moves a chunk while inserting another 20000 documents. The end asserts that there should be 40000 documents in the collection, but in my testing there are 20-30 documents missing.

      var LOG_FUNCTION = "function log(msg) {var date = new Date(); jsTest.log('MONGOISSUE - ' + date.getHours() + ':' + date.getMinutes() + ':' + date.getSeconds() + ':' + date.getMilliseconds() + ' - ' + msg);}";
      eval(LOG_FUNCTION);
      
      var numDocs = 20000;
      
      // Set up cluster.
      log('SETTING UP CLUSTER...');
      var st = new ShardingTest({shards: 2, other: {shardOptions: {storageEngine: 'mmapv1', verbose: 0}}});
      var s = st.s0;
      var d1 = st.shard1;
      var coll = s.getDB("test").foo;
      assert.commandWorked(s.adminCommand({enableSharding: coll.getDB().getName()}));
      assert.commandWorked(s.adminCommand({shardCollection: coll.getFullName(), key: {_id: "hashed"}}));
      log('INSERT START');
      for (i=0; i<numDocs; i++) {
          coll.insert({_id: i});
      }
      log('INSERT END');
      assert.commandWorked(coll.ensureIndex({a: 1}));
      
      // Check document count.
      var count = coll.find().itcount();
      log("DOC COUNT: " + count);
      assert.eq(numDocs, count);
      
      // Configure server to increase reproducibility.
      assert.commandWorked(d1.adminCommand({setParameter: 1, internalQueryExecYieldIterations: 2}));
      
      function logChunk(chunk) { log('chunk ' + chunk['_id'] + ' shard ' + chunk['shard']); }
      st.config.chunks.find().forEach(logChunk);
      
      // Initiate migration and add data in parallel.
      shell = startParallelShell(LOG_FUNCTION + " log('INSERT START'); var coll = db.getSiblingDB('test').foo; for (i=" + numDocs + "; i<" + (numDocs * 2) + "; i++) { coll.insert({_id: i}); }; log('INSERT END');", s.port);
      sleep(500);
      log('MOVECHUNK START');
      var res = s.adminCommand({moveChunk: coll.getFullName(), find: {_id: 0}, to: "shard0000", _waitForDelete: true});
      log('MOVECHUNK END');
      assert.commandWorked(res);
      st.config.chunks.find().forEach(logChunk);
      shell();
      
      // Re-check document count.
      var count = coll.find().itcount();
      log("DOC COUNT: " + count);
      assert.eq(numDocs * 2, count);
      

      I've reproduced this both in mongo 3.0.9 and 3.0.10.

      Update

      MongoDB 3.2 is not affected by this bug

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            dandrade@agoragames.com David Andrade
            Votes:
            0 Vote for this issue
            Watchers:
            28 Start watching this issue

              Created:
              Updated:
              Resolved: