[SERVER-45210] Accept and ignore the nonAtomic:true option in the mapReduce command Created: 17/Dec/19  Updated: 29/Oct/23  Resolved: 26/Feb/20

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: None
Fix Version/s: 4.3.4

Type: Improvement Priority: Major - P3
Reporter: Jeffrey Yemin Assignee: Katherine Wu (Inactive)
Resolution: Fixed Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-46685 mapReduce with FCV 4.2 on 4.4 branch ... Closed
related to SERVER-45205 Consider removing 'out.sharded' optio... Closed
is related to JAVA-3555 Work around changes to mapReduce impl... Closed
Backwards Compatibility: Minor Change
Sprint: Query 2020-01-13, Query 2020-02-24, Query 2020-03-09
Participants:

 Description   

All released versions of the Java driver send the following output document by default for any mapReduce command that targets a collection:

{ 
  replace: <collection name>,
  db: <db name>,
  sharded: false,
  nonAtomic: false
}

I understand that those are problematic defaults, but they are the documented defaults for those values, so it's reasonable that the driver is sending them.

Consider whether the 4.4 server can continue to support these defaults.



 Comments   
Comment by Githook User [ 26/Feb/20 ]

Author:

{'name': 'Katherine Wu', 'username': 'kaywux', 'email': 'katherine.wu@mongodb.com'}

Message: SERVER-45210 Accept and ignore nonAtomic:true option in mapReduce
Branch: master
https://github.com/mongodb/mongo/commit/416e276e3c1f5b814a73efeb287256a747653ba1

Comment by James Wahlin [ 29/Jan/20 ]

After further discussion we have decided to change mapReduce to accept the 'nonAtomic:true' option. This will allow users of older Java driver versions the option to either upgrade their driver, or to explicitly set "nonAtomic:true" and "sharded:true" which will be accepted and ignored by the MongoDB. We will continue to reject "nonAtomic:false" as mapReduce in 4.4 will no longer allow for atomic write to the output collection.

Comment by James Wahlin [ 13/Jan/20 ]

We have decided to allow and ignore both nonAtomic:false and sharded:false for MongoDB 4.4 mapReduce. We will document in our release notes and for the mapReduce command that they will be ignored. Allowing these options will prevent users of the 4.2 (prior to recent minor release) and earlier Java driver, who get these settings by default, from failing mapReduce invocations due to either.

Comment by James Wahlin [ 20/Dec/19 ]

behackett, we plan to reject nonAtomic: false and sharded: false because they will no longer apply to the mapReduce command in 4.4.

nonAtomic: false is a request for mapReduce output to collection to be performed in an atomic manner. In the legacy implementation, this meant taking a global write lock, blocking all other user traffic on mongod. With an aggregation based mapReduce we are removing the ability to do this and will write to output collections using the $out and $merge aggregation stages. We could consider allowing nonAtomic:false but will not support the atomic writes that it requests and would instead ignore.

sharded:false in the legacy implementation stated that any write to output collection would only be to an unsharded collection. Under the new implementation, we no longer drop/recreate output collections, so if the output collection exists and is sharded, we will write to a sharded collection (for reduce and merge output options). So we could consider accepting and ignoring this option as well, but would not prevent writes to a sharded output collection if specified.

We would be happy to consider either accepting/ignoring or rejecting these options depending on downstream impact. We have already accept/ignore sharding:true as in the legacy implementation it was required to write to a sharded collection, and needed to allow for an upgrade path.

Comment by Bernie Hackett [ 19/Dec/19 ]

Since the defaults for sharded and nonAtomic are false in the server, can you explain what harm accepting and ignoring false values would have? Can we just throw an error if either of them are set to true?

Comment by James Wahlin [ 18/Dec/19 ]

These defaults were removed from mapReduce as with 4.4 they describe behavior that is no longer supported.

The legacy mapReduce implementation would take a global write lock when nonAtomic: false during output to the target collection. This blocked all other user traffic on the mongod process for the duration. With an aggregation based implementation we no longer block in this manner. This means that we also do not provide the means to write all results from a mapReduce job in an atomic manner. If we were to provide this behavior in the future, it would be by adding support for running the mapReduce command or an equivalent aggregation pipeline in a transaction.

For the sharded: false option, we no longer restrict sharded vs unsharded collection writes to a mapReduce target collection by command flag.

For both of these options, we do have the option to allow and to ignore. While I expect this could lead to unexpected behavior for some users, I suspect breaking users who did not explicitly set these options would be worse.

Generated at Thu Feb 08 05:08:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.