[SERVER-12261] Map Reduce with sharded output collection creates orphan documents Created: 06/Jan/14  Updated: 28/Jan/19  Resolved: 28/Jan/19

Status: Closed
Project: Core Server
Component/s: MapReduce, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jon Rangel (Inactive) Assignee: Randolph Tan
Resolution: Done Votes: 1
Labels: gm-ack
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-14324 MapReduce does not respect existing s... Closed
Operating System: ALL
Sprint: Sharding 2019-01-28, Sharding 2019-02-11
Participants:

 Description   

During the post-processing phase of a map reduce run, when a shard pulls the documents for the chunks (of the output collection) that it owns from other shard(s), those documents are not deleted from the source shard(s). This may result in a large number of orphan documents which greatly increases the storage size of the output collection.

When documents are migrated across shards during post-processing, they should be removed from the source shard.



 Comments   
Comment by Randolph Tan [ 28/Jan/19 ]

My finding years ago was actually wrong. This is not an issue because the collection created on the first phase of map reduce is a temporary collection. Mongos also does a best-effort cleanup after the command is finished (regardless of whether it succeeded or errored out).

Generated at Thu Feb 08 03:28:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.