[SERVER-33538] mapReduce "replace" on a sharded output collection can lead to UUIDCatalog inconsistencies Created: 28/Feb/18 Updated: 29/Oct/23 Resolved: 22/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce, Sharding |
| Affects Version/s: | 3.6.4 |
| Fix Version/s: | 3.6.6, 4.0.0-rc1, 4.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Maria van Keulen | Assignee: | Janna Golden |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Minor Change | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Storage NYC 2018-03-26, Sharding 2018-05-21, Sharding 2018-06-04 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 60 | ||||||||||||||||||||||||||||||||||||
| Description |
|
mapReduce with a sharded output collection assigns the UUID obtained from the config server to the final output collection. mapReduce "replace" will drop the existing output collection, which has the same UUID as the new output collection. Two-phase-drop may cause the dropCollection to finish after the renameCollection finishes, erroneously removing the UUIDCatalog entry for the output collection. |
| Comments |
| Comment by Githook User [ 20/Jun/18 ] | ||||||||||||||||
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: | ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 11/Jun/18 ] | ||||||||||||||||
|
rribeiro, I looked at the logs you attached on | ||||||||||||||||
| Comment by Rui Ribeiro [ 11/Jun/18 ] | ||||||||||||||||
|
I understand your solution, but i am not using the option out: "replace", I use the option out: "reduce" In my situation I do: Pickup 3 Collections (sharded) ¦ Map Reduce (out:reduce, sharded:true) ¦ Ouput Collection(MR 1) ¦ Map Reduce (out:reduce, sharded:true) ¦ Ouput Collection(MR 2)
Maybee my issue is variation of this bug:
| ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 11/Jun/18 ] | ||||||||||||||||
|
rribeiro, I am not sure (kelsey.schubert?), but if you are running into this problem, you could run a collection drop on the sharded output collection before each mapReduce. This way, the mapReduce will create a new sharded output collection with a new UUID, rather than re-using the UUID from the existing sharded output collection, so you should not hit this bug. | ||||||||||||||||
| Comment by Rui Ribeiro [ 11/Jun/18 ] | ||||||||||||||||
|
Thank you for the quick answer. Do you have any idea, when the next 3.6.x will be released. Right now, I am facing this problem, and without this fix I can't use 3.6.4. Cheers
| ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 11/Jun/18 ] | ||||||||||||||||
|
rribeiro, the plan is to backport the fix to the next 3.6 dot release | ||||||||||||||||
| Comment by Rui Ribeiro [ 11/Jun/18 ] | ||||||||||||||||
|
HI Do I have a way apply this fix in version 3.6.4, or will be just released on version 4.0.0?
Thank you
| ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 06/Jun/18 ] | ||||||||||||||||
|
janna.golden, please review this table? My comments on the cr for the backport are based on this understanding:
| ||||||||||||||||
| Comment by Githook User [ 25/May/18 ] | ||||||||||||||||
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: (cherry picked from commit ff092947da81890ff92c427f50623d36d084e58c) | ||||||||||||||||
| Comment by Githook User [ 23/May/18 ] | ||||||||||||||||
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: | ||||||||||||||||
| Comment by Githook User [ 23/May/18 ] | ||||||||||||||||
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: (cherry picked from commit b69e6725325aaaae4fcca7563bf6428837ab7767) | ||||||||||||||||
| Comment by Janna Golden [ 23/May/18 ] | ||||||||||||||||
|
This was committed with the wrong server ticket number, commit is above. | ||||||||||||||||
| Comment by Janna Golden [ 23/May/18 ] | ||||||||||||||||
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: | ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 08/May/18 ] | ||||||||||||||||
|
Thanks asya Also, I just discussed with Asya that dropping the output collection from the cluster before starting the second phase adds a "window" where queries on the output collection can see a mix of
(because each shard will return data from one of these categories). Without this change, seeing "empty results" from any shard was not possible. We discussed that this is preferable to leaving the crash in the UUIDCatalog. | ||||||||||||||||
| Comment by Asya Kamsky [ 08/May/18 ] | ||||||||||||||||
|
> unless mapReduce with sharded out can only have _id as the shard key? Yes, we only allow (and work correctly) when output collection is sharded by _id.
| ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 20/Apr/18 ] | ||||||||||||||||
|
Though, the shardCollection logic in dropAndShardCollection would need to not create the collection on the primary shard, which may be problematic because it won't create the shard key index... unless mapReduce with sharded out can only have _id as the shard key? | ||||||||||||||||
| Comment by Esha Maharishi (Inactive) [ 20/Apr/18 ] | ||||||||||||||||
|
One sharding solution could be:
We could take this one step further
As schwerin noted, doing either of these and backporting them does not guarantee that the crash will not occur in a mixed-version cluster v3.6/v4.0 cluster (unless all the v3.6 nodes have been upgraded to the 3.6 dot release that has the fix). | ||||||||||||||||
| Comment by Maria van Keulen [ 19/Apr/18 ] | ||||||||||||||||
|
I am assigning this to Sharding so they can investigate potential sharding-level fixes for this bug. | ||||||||||||||||
| Comment by Maria van Keulen [ 28/Feb/18 ] | ||||||||||||||||
|
One fix for this problem is to use immediate (not two-phase) collection drops during the "replace" stage and restore the UUIDCatalog entry for the output collection at the end of the "replace" stage before releasing the lock. |