[SERVER-15004] MR program silently drops records bug or some configuration missing Created: 22/Aug/14 Updated: 11/Jan/15 Resolved: 10/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | 2.6.1 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Critical - P2 |
| Reporter: | Vipul | Assignee: | Ramon Fernandez Marina |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Participants: | |||||
| Description |
|
I am running a MR on mongodb, It silently drops records when I try to do a denormalize operation on 10000 records, it happens somewhere in the middle of the collections (~5K). I tried removing near by records (when sorted), its nothing wrong with the data. Here is my code:
I am running this on a machine which has very low config. 512 MB RAM with 1GB of SWAP memory. But whatever the case, it should not silently (randomly) drop elements of groups. The objects are not exceeding the BSON object limit.(just 10 to 15 array of objects per key). Any suggestions, what could be causing this issue? My code is working as designed. The only issue is dropping records for same emit key at a particular point after 5500 records in my code. e.g. productid=553 (emit key) has 12 elements.which is ending at 5502 nd record in input collection order_10000. 5501st and 5502nd record is getting dropped from the output of MR. Input Order_10000:
Output of MR:
Next group starts fine from here. Same thing happens if a particular group crosses next threshold. |
| Comments |
| Comment by Ramon Fernandez Marina [ 11/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
vipulmehta13, as per my message above, not all elements/values of the key come in the same record from a practical standpoint. My understanding of the internals is that the first call to reduce may contain all mapped values for a given key, but the reduce function may need to yield. If that's the case, it will be called a second time – only this time the results from the first reduce call will be passed as an additional value. This yielding can happen multiple times, so the reduce function may be called multiple times for one mapReduce() operation. Each additional time reduce is called, the results from the previous call will be passed as an additional value. You may want to read If you have further questions, please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience. A question like this involving more discussion would be best posted on the mongodb-user group, as the SERVER project is for reporting bugs and improvement suggestions against the MongoDB kernel. Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vipul [ 11/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
May be I am not able to understand it clearly. Even if the reduce function has been invoked more than once, All the elements of the key should come in the same record. Thanks, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 10/Jan/15 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
vipulmehta13, as pointed out in the stackoverflow thread where you first posted this, you need make sure to meet the requirements for the reduce function as described in the documentation:
Regards, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vipul [ 23/Aug/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Update: The alternate solution (code below) worked fine for me. But my previous code is still a mystery to me as I don't know what is wrong in my earlier code.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Vipul [ 23/Aug/14 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Attaching detailed code and data input_filter_12_elemets_of_553: Filter on input data. output_gives_only_11_elements_of_553.txt: Filter on output data. Hope this helps replicating the issue. |