[SERVER-12261] Map Reduce with sharded output collection creates orphan documents Created: 06/Jan/14 Updated: 28/Jan/19 Resolved: 28/Jan/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jon Rangel (Inactive) | Assignee: | Randolph Tan |
| Resolution: | Done | Votes: | 1 |
| Labels: | gm-ack | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Sprint: | Sharding 2019-01-28, Sharding 2019-02-11 | ||||||||
| Participants: | |||||||||
| Description |
|
During the post-processing phase of a map reduce run, when a shard pulls the documents for the chunks (of the output collection) that it owns from other shard(s), those documents are not deleted from the source shard(s). This may result in a large number of orphan documents which greatly increases the storage size of the output collection. When documents are migrated across shards during post-processing, they should be removed from the source shard. |
| Comments |
| Comment by Randolph Tan [ 28/Jan/19 ] |
|
My finding years ago was actually wrong. This is not an issue because the collection created on the first phase of map reduce is a temporary collection. Mongos also does a best-effort cleanup after the command is finished (regardless of whether it succeeded or errored out). |