[SERVER-80363] server default writeConcern is not honored when wtimeout is set Created: 23/Aug/23  Updated: 05/Feb/24  Resolved: 22/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.9
Fix Version/s: 7.2.1, 7.3.0-rc0, 7.0.5

Type: Bug Priority: Major - P3
Reporter: Asya Kamsky Assignee: Moustafa Maher
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-83193 Remove write concern from the batched... Open
is depended on by SERVER-76351 Improve readability of write concern ... Backlog
Documented
is documented by DOCS-16510 Investigate changes in SERVER-80363: ... Closed
Problem/Incident
Related
related to SERVER-86116 CreateCollectionCoordinator may fail ... Closed
related to SERVER-86201 Cluster upserts performed through the... Closed
is related to DRIVERS-2074 Should we parse wtimeoutMS when w <= 1? Backlog
Assigned Teams:
Replication
Backwards Compatibility: Minor Change
Operating System: ALL
Backport Requested:
v7.2, v7.0, v6.0, v5.0
Sprint: Repl 2023-09-18, Repl 2023-10-02, Repl 2023-10-16, Repl 2023-10-30, Repl 2023-11-13, Repl 2023-11-27
Participants:
Case:
Linked BF Score: 9

 Description   

Sending wtimeout:x without specifying w value causes the server to use w:1, wtimeout:x instead of combining wtimeout with server default (which should be majority in 5.0+).

Reproduced with Java driver and mongo shell against 6.0. In the logs it indicates it received only wtimeout:xxx for writeConcern but it indicates it uses w:1 with passed wtimeout and claims it's provided by client.



 Comments   
Comment by Githook User [ 13/Jan/24 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-80363 server default writeConcern is not honored when wtimeout is set
Branch: v7.2
https://github.com/mongodb/mongo/commit/6ebac6e589d809ba3425e66e37c767f14ac16f95

Comment by Githook User [ 05/Dec/23 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-80363 server default writeConcern is not honored when wtimeout is set

(cherry picked from commit 8ecb0d8535cddd51f38e262ed2ff95105b4bd60a)

GitOrigin-RevId: 9b61e09bb101b9992bfce99f98c6622f4c892057
Branch: v7.0
https://github.com/mongodb/mongo/commit/1684d02c4620f557ab9c721bad5ebbadb22d5337

Comment by Githook User [ 21/Nov/23 ]

Author:

{'name': 'Moustafa Maher Khalil', 'email': 'm.maher@mongodb.com', 'username': 'moustafamaher'}

Message: SERVER-80363 server default writeConcern is not honored when wtimeout is set
Branch: master
https://github.com/mongodb/mongo/commit/8ecb0d8535cddd51f38e262ed2ff95105b4bd60a

Comment by Samyukta Lanka [ 13/Sep/23 ]

I agree with Max. I think that educating customers about the missing fields makes more sense and will lead to a situation that will be easier to understand for customers (and for ourselves). 

Comment by Moustafa Maher [ 11/Sep/23 ]

I have discussed offline with judah.schvimer@mongodb.com about my last suggestion about enforcing the w field and we agreed that is considered a breaking change to the stable API.

judah.schvimer@mongodb.com & samy.lanka@mongodb.com & asya.kamsky@mongodb.com  are you agreeing with max.hirschhorn@mongodb.com that we don't need to fix this but we can add some documentation to educate customers about the missing fields ? 

Comment by Moustafa Maher [ 08/Sep/23 ]

But I was thinking of an intermediate solution:
What do you think about the idea of enforcing that the writeConcern object must have w field either in the commands or CWWC, and that can easily being added through our IDL parsers ?
In my opinion that is more obvious and remove all ambiguity, with explaining the default values of missing fields. 

Comment by Moustafa Maher [ 08/Sep/23 ]

I've been pondering the best way to tackle this issue, but every idea I come up (to stitch WCs together) seems to make things more complicated  which will make our documents even more ambiguous for users.
So I agree with max.hirschhorn@mongodb.com, but yet we need to document the default values for missing fields that supplied by the client, or set in CWWC which in this case w:1. 

Comment by Max Hirschhorn [ 08/Sep/23 ]

Hello! I wanted to chime in here and make ensure I'm understanding which type of application developers we are aiming to a solve a problem for.

For example, I could imagine wanting to advertise - application developer sees MongoDB 5.0 provides a safe default write concern automatically and believes upon upgrade their application can immediately benefit. Is this user expectation reasonable? Possibly, but we also wrote text to say the new default won't override any explicit write concerns configured in the application itself. (Quoted inline below.)

Upgrade considerations
For users who do not set any write concern and instead rely on the defaults that MongoDB provides, w:majority will become the default write concern starting in MongoDB 5.0. This new default will take effect without any user action.

The new default does not override any explicit write concerns that developers have configured previously. Developers who want their application to continue to use the write concern w:1 should explicitly set their default write concern to w:1 prior to upgrade, whether in their driver or server-side with a global write concern to maintain the previous behavior.

So one question here may be - do application developers consider wtimeout something different from an explicit write concern? In my mind, no, if the application constructs a WriteConcern object then its an explicit write concern. In most programming languages, auditing for usages of such a WriteConcern object is feasible with simple code search. I'm open to hearing there being actual ambiguity from others on this ticket.

If there is ambiguity, then is it specific to how wtimeout combines with a cluster-wide default w setting? My answer is yes.

Let's say the situation was reversed slightly - The application code has explicitly constructed an operation with {writeConcern: {w: "majority"}}. And for one reason or another the cluster-wide default write concern is changed to be configured as {w: 2, wtimeout: 100} (yes, contrived I know). It would be a shock to me if the WriteConcern object expressed in the code was not one which actually took effect. That is, if the resulting write concern applied by the server was from combining the two settings into {writeConcern: {w: "majority", wtimeout: 100}}. This would lead to a WriteConcernFailed error response being propagated if the secondaries are slightly lagged and not only when an election had occurred. This would result in application errors which while already possible to happen now possibly happening at a much higher frequency. Needing to type {writeConcern: {w: "majority", wtimeout: 0}} to defensively guard against an operator configuring a bizarre cluster-wide default write concern feels very silly.

We change that behaviour and try to fill all the missing fields of supplied writeConcern with the default write concern (or CWWC if exists)

Filling in all of the missing fields of the supplied write concern is much more problematic to me. I'm also not fully convinced layering client wtimeout on top of cluster-wide w is solving a common surprise in expectations. To me, the rule of "if you type 'writeConcern' in your application code then the cluster-wide default won't take effect" satisfies the property of being simple to describe and doesn't sound unnatural.

Comment by Moustafa Maher [ 25/Aug/23 ]

The complication is coming from multiple points:

We used to identify that scenario as a client-provided case, and within the code, there are several instances where the execution relies on the flag that specifies whether we utilize the default setting or not. Therefore, it's necessary for us to assess whether we should consider an incompletely defined writeConcern as coming from the client or as a default setting. This evaluation will have implications for the remaining code logic and the metrics that we have incorporated. Additionally, we must decide which absent fields we should populate using the default values.

This decision-making process extends to determining whether to address only the 'w' parameter or should we also include the 'wTimeout' and 'Journal' options, among others. Furthermore, if we decide to allow incomplete write concerns, this implies permitting incomplete 'CWWC' as well. This could potentially result in a hybrid writeConcern configuration, an example being:

{
   w: from implicit default,
   wTimout: from the client supplied,
   J: from CWWC
}

Comment by Asya Kamsky [ 25/Aug/23 ]

I think the current behavior is wrong aka bug for 5.0+ so I don’t think (1) is an option. I’m a little surprised that (2) is complicated given we are already setting an arbitrary default but I suppose just changing it to majority starting in 5.0 is technically incorrect for clusters that changed their default WC to something else.

Given how long this has been broken I guess it’s ok not to hold up 7.1 for it, but I feel strongly we should fix and backport this soon.

Comment by Moustafa Maher [ 24/Aug/23 ]

I agree with you judah.schvimer@mongodb.com, So I should put this ticket back to scheduling and not block 7.1 on it, do you agree ?

Comment by Judah Schvimer [ 24/Aug/23 ]

I think (2) is a lot more intuitive to me, but we can defer the backport decision to after we've completed the fix on master.

Comment by Moustafa Maher [ 24/Aug/23 ]

I have investigating this, and I see that is working as design since v4.2, that if the client specify any write concern (like in this ticket "writeConcern": {"wtimeout":10000}" we won't use the cluster wide write concern (CWWC) and treat it as client supplied write concern and let the IDL parser to parse it and default all missing fields to its default values which is 1 for the field w.
So when we introduced the implicitMajorityDefaultWC project to set the default wc to majority if CWWC doesn't exist we honoured the existent behaviour of of the client supplied write concern.

That being said, we have two options: 

  1. We edit our documentation to reflect that behaviour.
    • Easier solution.
  2. We change that behaviour and try to fill all the missing fields of supplied writeConcern with the default write concern (or CWWC if exists) but that will be a bit complex change and risky change to block 7.1 for it.
    • In my opinion I see this area of the code is complex and we need to fix it and remove this ambiguity.
    • But I think backporting this to 5.0 will be complex as I see alot of changes made around that area since. (SERVER-62609)

judah.schvimer@mongodb.com & samy.lanka@mongodb.com what do you think ? 

Generated at Thu Feb 08 06:43:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.