[SERVER-4987] new bigMapReduce.js issue Created: 16/Feb/12 Updated: 11/Jul/16 Resolved: 09/Mar/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce, Sharding |
| Affects Version/s: | 2.1.0 |
| Fix Version/s: | 2.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 0 |
| Labels: | buildbot | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Docs not counted correctly in mapreduce output - not the mapreduce input. http://buildbot.mongodb.org/builders/OS%20X%2010.5%2032-bit/builds/3364/steps/test_3/logs/stdio/text |
| Comments |
| Comment by auto [ 09/Mar/12 ] |
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Modified |
| Comment by auto [ 03/Mar/12 ] |
|
Author: {u'login': u'', u'name': u'Randolph Tan', u'email': u'randolph@10gen.com'}Message: Test for |
| Comment by Randolph Tan [ 24/Feb/12 ] |
|
Investigation result: The test failed because the getLastError that is supposed to synchronize and make sure that prior inserts made it to the shards allowed some failed inserts to slip through, and thus the map reduce jobs where not able to get all the inputs. The write failures occurred because of the shard version changes during background chunk migrations by the balancer. Normally, these failures are handled via writebacks which is either fetched by a background thread or when getLastError is called. The bug happens when an insert will trigger a split chunk during a migration, and fails. What happens during this failure is that an uncaught StaleConfigExceptions will cause the connection not to be put back into the pool. So when getLastError command is called, it will be using a new connection, and it will not catch the pending writebacks since these info are kept in a thread local variable of the previous connection. |
| Comment by auto [ 24/Feb/12 ] |
|
Author: {u'login': u'gregstuder', u'name': u'Greg Studer', u'email': u'greg@10gen.com'}Message: |
| Comment by Greg Studer [ 22/Feb/12 ] |
|
New failure - http://buildbot.mongodb.org/builders/Linux%2064-bit%20v8/builds/3032/steps/test_3/logs/stdio/text Issue is that writebacks (46 of them) are not being processed during the GLE - they are returning much later (after two m/r jobs have completely finished). Suspect the issue is the writes are in-flight when the GLE is called, and therefore not registered as queued when GLE called, which we suspect can happen if there are multiple connections to a shard. |
| Comment by Greg Studer [ 16/Feb/12 ] |
|
... i.e. - not the right number of docs after mapreduce - previous failures were on the raw inserts to the collection pre-processing. |