[SERVER-15334] Map/Reduce jobs stall when database is taking writes Created: 19/Sep/14 Updated: 01/Apr/15 Resolved: 01/Apr/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.4.10 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Alex Piggott | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | map_reduce, mapreduce | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | Appears to be: Start longish running indexed map/reduce job(s) |
| Participants: |
| Description |
|
Because of a race condition in my code, I can delete a set of docs over which I am in the middle of running a map/reduce job (precise details below, I don't believe they are relevant). The map/reduce query is indexed. My expectation is that the map/reduce job would just "ignore" records that were deleted. Instead something odder seems to happen - the jobs last for way longer than they should and we see performance degradation For example, here's a db.currentOp:
Check out the Emit Progress... There were a few of these, they all ran for 20 minutes or so (the number of docs being deleted was small - in the few thousand range), before eventually cleaning themselves up. Bonus worry: I have a similar case in which I run a map/reduce over the entire collection (several 10s of millions of documents), to which documents (Details: |
| Comments |
| Comment by Alex Piggott [ 01/Apr/15 ] |
|
Thanks for the update! I haven't seen it, or at least noticed it, since I put some workarounds in to minimize the probability of the bulk delete/MR happening at the same time - so I mainly wanted to make sure you were aware it was happening and could look at fixing it in a later version - sounds like mission accomplished! |
| Comment by Ramon Fernandez Marina [ 01/Apr/15 ] |
|
Hi apiggott@ikanow.com, it seems we let this ticket fall through the cracks – very sorry about that. After testing on my end I think the root of the issue is the following: the total emit work is calculated at the beginning of the job, but if more documents are inserted before the job completes the percentage will go above 100%. I'm also able to see apparent performance problem: if documents are being inserted the write lock will prevent readers from making progress, so the mapReduce job will appear to be stuck. Note that newer versions of MongoDB have finer locking granularity, so if you need to be able to write to one collection while doing a mapReduce on another you may want to consider upgrading to MongoDB 3.0. Regards, |
| Comment by Alex Piggott [ 14/Nov/14 ] |
|
Anyone planning to look at this? I've personally worked around it, but it seems like there's a reasonably serious bug that doesn't require a particularly outlandish scenario to occur... |
| Comment by Alex Piggott [ 19/Sep/14 ] |
|
Here are the map/reduce jobs being run via: Hmm so one other race condition potential is that I have multiple jobs running against the same output collection (and as mentioned above with the same query) - I couldn't find anything in the documentation that explicitly state that would be a problem |
| Comment by Alex Piggott [ 19/Sep/14 ] |
|
Someone else reported something similar on stack overflow: http://stackoverflow.com/questions/24312785/mongodb-mapreduce-causes-error-error-during-query |
| Comment by Alex Piggott [ 19/Sep/14 ] |
|
title should read "degrade performance" not "kill performance", sorry got carried away! |