[SERVER-16605] Mapreduce into sharded collection with hashed index fails Created: 19/Dec/14 Updated: 06/Dec/22 Resolved: 13/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce, Sharding |
| Affects Version/s: | 2.6.5, 2.7.8 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | D.H.J. Takken | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Duplicate | Votes: | 1 |
| Labels: | open_todo_in_code, sharding, todo_in_code | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Ubuntu 14.10 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||
| Operating System: | Linux | ||||||||||||||||
| Steps To Reproduce: | 1. Instantiate a new, clean MongoDB cluster, featuring a single shard server, config server and mongos. (Python implementation of this test case is attached) |
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
When outputting from a map reduce job into a sharded output collection which features a hashed index on the _id field, no output is produced. The _id field is also the sharding key, so this issue Extensive testing shows that this happens only for the first map reduce that is ever run on a MongoDB cluster. It fails to produce output and in the process, the name of the output collection appears to become 'cursed' somehow: Any subsequent map-reduce job runs fail if that same output collection name is used. Even if the collection is re-created or the entire database is dropped and re-created, or if a different database is used. The name of the output collection can never be used again. Only when outputting into a collection with a different name, the exact same map reduce job processing the exact same data will succeed. The problem emerges on sharded clusters only, and only when the output collection uses a hashed index. It is possible to work around this problem by running a dummy map reduce job on newly setup MongoDB clusters, using an output collection that will never be used in regular operations. |
| Comments |
| Comment by Asya Kamsky [ 08/May/18 ] |
|
I think this is just an instance of |
| Comment by Asya Kamsky [ 08/May/18 ] |
|
I just verified that this is an issue if you are trying to output into a collection with shard key {_id:"hashed"} If the sharding of collection is changed to {_id:1} then it works as expected.
|
| Comment by Ramon Fernandez Marina [ 19/Feb/16 ] |
|
MosheKaplan, please take a look at |
| Comment by Moshe Kaplan [X] [ 19/Feb/16 ] |
|
We suffer from the same case at 3.0.3. Thanks |
| Comment by Ramon Fernandez Marina [ 19/May/15 ] |
|
dtakken, looks like we let this ticket fall through the cracks – very sorry about that. Thanks for the concise script, I'm able to reproduce the behavior we describe and we're investigating. Once thing I've noticed is that it seems to be the "OutputCollectionA" name the only one that doesn't work, not just the first one that's used. There's room for improvement in mapReduce operations with output to sharded collections, but this looks like a very strange bug and I wonder if there are other names that fail in the same manner. Will post updates to this ticket as they become available. Thanks, EDIT |
| Comment by D.H.J. Takken [ 22/Dec/14 ] |
|
Uploaded a new version of the test case, removing an index creation statement that is not relevant to the testcase. Also, I added the logging output of the testcase run on MongoDB 2.8 RC3. |
| Comment by D.H.J. Takken [ 22/Dec/14 ] |
|
I just tested on version 2.8 RC3 and the problem reproduces there as well. |
| Comment by D.H.J. Takken [ 19/Dec/14 ] |
|
That is correct. I use the mongodb-org and mongodb-org-unstable packages by Ernie Hershey. Packages for the 2.8 RC series have not appeared in the repository yet. |
| Comment by Asya Kamsky [ 19/Dec/14 ] |
|
You'd mentioned on mongodb-dev group that you can reproduce this on 2.6.5 and 2.7.x builds (2.8.0-rc?) is that correct? I'm double-checking since the 10gen packages are all 2.4 I believe. |