[SERVER-13442] mapReduce nonAtomic output option Created: 01/Apr/14 Updated: 10/Dec/14 Resolved: 24/Jul/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Minor - P4 |
| Reporter: | Garrett Kolpin | Assignee: | Mathias Stearn |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
I'm running some map reduce jobs, specifying the out parameter as: {reduce: <outputCollection>, db: <outputDB>}The output step takes many minutes to complete and seems to lock the entire mongod process while it's running. I've read that the nonAtomic option could perhaps prevent this locking behavior. I'm running multiple map reduce jobs in parallel which all specify the same output collection and db in the reduce step. Does specifying {nonAtomic: true}ensure that the individual documents in the output collection are updated atomically? Could there exist race conditions between the parallel MR reduce operations since they are reducing to the same collection? |
| Comments |
| Comment by Ramon Fernandez Marina [ 24/Jul/14 ] |
|
gkolpin, there are no race conditions when running multiple MR jobs and using reduce operations (using merge the behavior is undefined, but looks like you're not doing this so no need to worry about it). As for the global lock, we have SERVER-13552 to improve the situation, so we're closing this ticket as a duplicate of SERVER-13552. Regards, |
| Comment by Garrett Kolpin [ 24/Jul/14 ] |
|
Hi Thomas, We're running mongo 2.4.9 with Linux 2.6.32-431.3.1.el6.x86_64 (Centos 6.5). It's still an issue in the sense that we have global locking issues when running our map reduce jobs, however we've mitigated the problem by running our map reduce jobs on a separate instance of mongo, which won't affect our other operations during the time that the map reduce jobs are running. Since I opened this issue, I've spoken with Andre Spiegel who confirmed that mongo does indeed grab a global lock during map-reduce operations. It was with consultation with him that we decided on the approach we're now taking. It would be nice to have this fixed, but since we've been able to work around the locking issue it's no longer as critical for us as it was. Thanks, |
| Comment by Thomas Rueckstiess [ 24/Jul/14 ] |
|
Hi Garrett, Is this still an issue for you? If so, can you let me know what OS you were running and what version of MongoDB you were using? Regards, |
| Comment by Thomas Rueckstiess [ 07/Apr/14 ] |
|
Hi Garrett, Further testing indicates that this might mostly be an issue on the Mac OS X platform, and we couldn't reproduce such a significant performance impact on Linux. Can you confirm what OS you ran your tests on? Can you also let me know what version of MongoDB you were using? Regarding your questions about the behavior of multiple concurrent map/reduce jobs reducing to the same collection, I'll have to follow up on this and will get back to you when I know more. Thanks, |
| Comment by Garrett Kolpin [ 04/Apr/14 ] |
|
Thomas, thanks for the reply. Do you have any info regarding how the reduce phase works when multiple concurrent map-reduces are reducing to the same output collection? Is the reduce phase atomic - could data get dropped if one reduce phase writes a document that another one hasn't seen and overwrites the previous reduce phase's data? |
| Comment by Thomas Rueckstiess [ 03/Apr/14 ] |
|
Hi Garrett, Thanks for reporting this issue. I've done some tests and can reproduce the behavior you're seeing: Running the same map/reduce job multiple times concurrently (output to the same collection) significantly slows the jobs down. In my tests, a single job took 9 seconds and 4 concurrent jobs took 90 seconds to complete. The nonAtomic option did not make a difference in my tests. Until we have determined the reason for this behavior, I recommend you run only a single job at a time, as this seems to be the fastest option. Regards, |