[SERVER-5742] bigMapReduce.js failing b/c of migrate deletion during m/r Created: 01/May/12  Updated: 11/Jul/16  Resolved: 06/Jun/12

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: 2.1.2

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: buildbot
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Clearest failure is here:

http://buildbot.mongodb.org/builders/Linux%20RHEL%2032-bit/builds/375/steps/test/logs/stdio/text



 Comments   
Comment by Greg Studer [ 06/Jun/12 ]

should be okay - batchsize not applicable and the cursor won't do any work since it's not used except as a placeholder.

Comment by Greg Studer [ 03/May/12 ]

Reopening b/c want to verify that the cursor we open is actually very low-cost - change batchsize if necessary.

Comment by auto [ 02/May/12 ]

Author:

{u'login': u'gregstuder', u'name': u'gregs', u'email': u'greg@10gen.com'}

Message: SERVER-5742 don't clean up holding cursor, needed to ensure data is not removed during m/r

Later cursor does not prevent cleanup.
Branch: master
https://github.com/mongodb/mongo/commit/a105cf0bb3b8bc8aac662441f53deae5e4e83462

Comment by Greg Studer [ 01/May/12 ]

Problem is that the cursor we open to prevent migration deletions in the initial mapReduce step is removed when the M/R cursor is opened. If a migration completes before the second cursor is initialized (which, on the RH build machines takes a long time), the migration data will not be protected.

The initial cursor is opened before any of the potentially costly temporary collection creations and deletions, so that a StaleConfigException is thrown early and retries can occur if needed for a consistent view of data across the cluster.

Comment by Andy Schwerin [ 01/May/12 ]

More details, for posterity, Greg?

Comment by Greg Studer [ 01/May/12 ]

Core issue is this:

      1  m31200| Mon Apr 30 21:34:11 [conn16] CMD: drop test.tmp.mr.foo_14_inc
      1  m31100| Mon Apr 30 21:34:11 [conn16] CMD: drop test.tmp.mr.foo_20_inc
 
 
      1  m31200| Mon Apr 30 21:34:17 [conn26] forking for cleaning up chunk data
      1  m31200| Mon Apr 30 21:34:17 [cleanupOldData]  (start) waiting to cleanup test.foo from { _id: ObjectId('4f9f3ce08372bc428f69f5df') } -> { _id: ObjectId('4f9f3ce28372bc428f69f700') }  # cursors:2
 
// Good, there are two cursors open, so the data should be protected
      
      1  m31200| Mon Apr 30 21:34:17 [conn16] Count with ns: test.foo and query: { i: { $lt: 25600.0 } } failed with exception: 13388 [test.foo] shard version not ok in Client::Context: your version is too old ( ns : test.foo, received : 28|7, wanted : 29|0, send )      
   2038  m31200| Mon Apr 30 21:34:17 [conn16] warning: hit an inactive ProgressMeter
 
      1  m31200| Mon Apr 30 21:34:18 [cleanupOldData] moveChunk deleted: 289
      
// Bad, there should still be a cursor open?
 
      1  m31200| Mon Apr 30 21:34:29 [conn16] command test.$cmd command: { mapreduce: "foo", map: function () {
      1  m31200|     emit(this.val, 1);
      1  m31200| }, reduce: function (key, values) {
      1  m31200|     return Array.sum(values);
      1  m31200| }, query: { i: { $lt: 25600.0 } }, out: "tmp.mrs.foo_1335836051_11", shardedFirstPass: true } ntoreturn:1 keyUpdates:0 reslen:149 18089ms
 
      
      1  m31100| Mon Apr 30 21:34:32 [conn16] command test.$cmd command: { mapreduce: "foo", map: function () {
      1  m31100|     emit(this.val, 1);
      1  m31100| }, reduce: function (key, values) {
      1  m31100|     return Array.sum(values);
      1  m31100| }, query: { i: { $lt: 25600.0 } }, out: "tmp.mrs.foo_1335836051_11", shardedFirstPass: true } ntoreturn:1 keyUpdates:0 reslen:149 20823ms

Generated at Thu Feb 08 03:09:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.