[SERVER-37679] Concurrent sharded fuzzers should not run mapReduces with replace Created: 19/Oct/18  Updated: 22/Mar/19  Resolved: 22/Mar/19

Status: Closed
Project: Core Server
Component/s: MapReduce, Sharding
Affects Version/s: 3.6.8, 4.0.3, 4.1.4
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Blake Oler Assignee: Blake Oler
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
is documented by DOCS-12566 Indicate unsupported mapReduce scenarios Closed
Duplicate
is duplicated by SERVER-38471 Concurrent mapReduce to same output s... Closed
Related
related to SERVER-38471 Concurrent mapReduce to same output s... Closed
Sprint: Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25
Participants:
Linked BF Score: 53

 Description   

Concurrent mapReduces with replace are not supported. As part of the cluster's map reduce logic, we will always drop the output collection then shard it again. Since these two operations are not atomic, it can leads to a race condition where two mapReduces can simultaneously output to the same collection with the same UUID.

With mapReduce A and B, in order:

  1. mapReduce A drops the output collection
  2. mapReduce B drops the output collection
  3. mapReduce A creates the sharded output collection with a generated UUID
  4. mapReduce B attempts to shard the output collection, but when doing is told that the collection is already sharded and receives the UUID that mapReduce A generated.

In order to avoid this undefined behavior, no sharded fuzzer suites (specifically: jstestfuzz_sharded_causal_consistency) should run mapReduce with replace.



 Comments   
Comment by Blake Oler [ 22/Mar/19 ]

Closing as Won't Fix. The issue has been prevented from happening in our testing infrastructure and a DOCS ticket has been filed to warn users not to run mapReduces in this fashion.

Comment by Kaloian Manassiev [ 15/Feb/19 ]

Assigning to blake.oler to file a DOCS ticket with what needs to be updated in the documentation and close this ticket, possibly black-listing the failing suites.

Comment by Alyson Cabral (Inactive) [ 07/Feb/19 ]

This should be documented.

Comment by Esha Maharishi (Inactive) [ 07/Feb/19 ]

Note that we can't really "ban" the concurrent mapReduces without just doing the fix (holding the distlock across the dropCollection and shardCollection), since the mapReduces can be sent to different routers.

Comment by Asya Kamsky [ 17/Jan/19 ]

Should this limitation be in our documentation?

Comment by Esha Maharishi (Inactive) [ 29/Nov/18 ]

max.hirschhorn, I believe the mapReduce will fail early on mongos if 'out' is specified without 'sharded: true' but mongos believes the collection is sharded. A test for this was added recently here.

I think the fact that mongos may be stale won't be an issue, since the fuzzer suites use only one mongos.

Comment by Max Hirschhorn [ 29/Nov/18 ]

esha.maharishi, the (concurrent) fuzzer cannot know if the collection is sharded when it goes to run the mapReduce command.

Comment by Esha Maharishi (Inactive) [ 28/Nov/18 ]

max.hirschhorn, we would ban mapReduces with output to a sharded collection. We could still leave in mapReduces with sharded input and with output to an unsharded collection with any of replace, merge, or reduce modes.

Comment by Esha Maharishi (Inactive) [ 29/Oct/18 ]

Note, we may have to blacklist mapReduce with reduce and merge (not just replace) to an output collection as well, since if the output collection does not exist or is empty, both of these modes drop and re-shard the output collection, which leads to the same race.

I saw this in a BF in a sharded concurrent fuzzer recently and hadn't finished investigating it, but that was the likely cause.

Comment by Esha Maharishi (Inactive) [ 19/Oct/18 ]

On further thought, we may be able to solve this by creating an atomic "dropAndShardCollection" command that holds distlocks across the drop and shardCollection operations.

However, I am not sure we want to invest further time into sharded mapReduce with replace, so blacklisting mapReduce with replace from the sharded concurrent fuzzers may be the only alternative.

Comment by Blake Oler [ 19/Oct/18 ]

CC esha.maharishi

Generated at Thu Feb 08 04:46:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.