[SERVER-30692] mapReduce's postProcessCollection() should take db write lock instead of global write lock Created: 16/Aug/17 Updated: 30/Oct/23 Resolved: 09/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Querying |
| Affects Version/s: | 3.5.11 |
| Fix Version/s: | 4.4.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | query-44-grooming | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Query
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Participants: | |||||
| Description |
|
mapReduce's postProcessCollection() on mongod may take a global write lock. The comment above it says it's because renaming the temp collection to the final output collection may be a rename across databases. However, I believe it's enough to just hold a database lock (not global lock) here. The temp collection is always created in the same database as the final output collection, so the rename should never actually be across databases. The online docs suggest that if the 'nonAtomic' option to mapReduce is true, only a database (of the final output collection) will be locked. Further, it looks like the comment about renaming across databases was introduced in this commit, and prior to that, Eliot had commented that a global lock probably did not need to be taken. This is important because renaming a collection across databases does not preserve the collection's UUID. For the UUIDs project, I am assuming that the global lock taken here is errant, and that the rename can never be across databases. |
| Comments |
| Comment by Charlie Swanson [ 09/Mar/20 ] |
|
This is no longer a problem after completing a recent project where we created a new implementation of mapReduce backed by the aggregation framework. |
| Comment by Esha Maharishi (Inactive) [ 21/Aug/17 ] |
|
james.wahlin, it's not critically important to fix - it's just incorrect and misleading in the code, especially since now some behavior (UUID consistency) depends on the rename not being across databases. Some future reader may think there's a bug in preserving UUIDs when really it's a misleading comment and a bug in taking the wrong lock. asya, yes, however if a different output db is specified, the temp collection is created in that output db (see https://github.com/mongodb/mongo/blob/r3.5.11/src/mongo/db/commands/mr.cpp?#L312). |
| Comment by Asya Kamsky [ 21/Aug/17 ] |
|
https://docs.mongodb.com/manual/reference/command/mapReduce/#mapreduce-out-cmd In fact, mapReduce allows specifying a different DB than the one the command is running on. If not specified then it will output in the same DB. |
| Comment by James Wahlin [ 18/Aug/17 ] |
|
esha.maharishi - can you provide more background on why this change is important? Does it impact Cloud/Atlas? Or is there a performance problem related to this behavior? |