[SERVER-30692] mapReduce's postProcessCollection() should take db write lock instead of global write lock Created: 16/Aug/17  Updated: 30/Oct/23  Resolved: 09/Mar/20

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.5.11
Fix Version/s: 4.4.0

Type: Improvement Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Backlog - Query Team (Inactive)
Resolution: Fixed Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Participants:

 Description   

mapReduce's postProcessCollection() on mongod may take a global write lock. The comment above it says it's because renaming the temp collection to the final output collection may be a rename across databases.

However, I believe it's enough to just hold a database lock (not global lock) here. The temp collection is always created in the same database as the final output collection, so the rename should never actually be across databases.

The online docs suggest that if the 'nonAtomic' option to mapReduce is true, only a database (of the final output collection) will be locked.

Further, it looks like the comment about renaming across databases was introduced in this commit, and prior to that, Eliot had commented that a global lock probably did not need to be taken.

This is important because renaming a collection across databases does not preserve the collection's UUID. For the UUIDs project, I am assuming that the global lock taken here is errant, and that the rename can never be across databases.

CC marko.vojvodic



 Comments   
Comment by Charlie Swanson [ 09/Mar/20 ]

This is no longer a problem after completing a recent project where we created a new implementation of mapReduce backed by the aggregation framework.

Comment by Esha Maharishi (Inactive) [ 21/Aug/17 ]

james.wahlin, it's not critically important to fix - it's just incorrect and misleading in the code, especially since now some behavior (UUID consistency) depends on the rename not being across databases. Some future reader may think there's a bug in preserving UUIDs when really it's a misleading comment and a bug in taking the wrong lock.

asya, yes, however if a different output db is specified, the temp collection is created in that output db (see https://github.com/mongodb/mongo/blob/r3.5.11/src/mongo/db/commands/mr.cpp?#L312).

Comment by Asya Kamsky [ 21/Aug/17 ]

https://docs.mongodb.com/manual/reference/command/mapReduce/#mapreduce-out-cmd

In fact, mapReduce allows specifying a different DB than the one the command is running on. If not specified then it will output in the same DB.

Comment by James Wahlin [ 18/Aug/17 ]

esha.maharishi - can you provide more background on why this change is important? Does it impact Cloud/Atlas? Or is there a performance problem related to this behavior?

Generated at Thu Feb 08 04:24:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.