[SERVER-27239] Remove the usage of distributed collection lock during map/reduce Created: 30/Nov/16 Updated: 06/Dec/22 Resolved: 24/Jul/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | remove-distributed-lock-fallout | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Sharding
|
| Participants: |
| Description |
|
Map/reduce with a sharded output collection performs an optimization where the final reduce step is omitted and the previous reduce step is done in parallel on all shards. This is done by creating an empty sharded collection and spreading the chunks so they are co-located with the data to be reduced based on the shard key, writing all output locally to an empty temporary collection and then renaming the temporary collection to the name of the output collection. This process only works if no chunks of the output collection move around while the output is being written and is protected through the usage of the collection distributed lock. This task is to get rid of this reliance on the collection distributed lock. |
| Comments |
| Comment by Kaloian Manassiev [ 24/Jul/21 ] |
|
asya, this is correct, but not because it's aggregation, but because it never creates a new sharded collection. Closing as Gone Away. |
| Comment by Asya Kamsky [ 23/Jul/21 ] |
|
kaloian.manassiev this can probably be closed since MR is now aggregation? |
| Comment by Randolph Tan [ 11/Dec/18 ] |
|
I also realized that the dist lock also serves another purpose besides making sure that chunks don't move around. It serializes concurrent mapReduce outputting to the same sharded collection. Because of how merge uses a temp collection to perform the reduce and drop the original collection, mapReduce outputting to the same collection will step on each other's toes if they are allowed to run at the same time. |