[SERVER-36478] The setFCV command should respect a user-provided wtimeout Created: 06/Aug/18  Updated: 29/Oct/23  Resolved: 10/Oct/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage, Upgrade/Downgrade
Affects Version/s: None
Fix Version/s: 4.1.4

Type: Improvement Priority: Major - P3
Reporter: Neha Khatri Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: nyc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-34839 Add a test to ensure writeConcern=maj... Backlog
Related
is related to DOCS-14888 [SERVER] Document that setFeatureComp... Closed
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2018-10-08, Storage NYC 2018-10-22
Participants:

 Description   

In 4.2, setFeatureCompatibilityVersion command is a chain of operations including the the FCV document update operation and the collMod operations for unique index upgrade. Each of these individual operation within setFCV has its own writeConcern without dependancy on the others. It is required to define a clear dependancy between setFCV and the other operations within setFCV.

One example is, setFCV with user supplied writeConcern timeout should propagate this timeout to the writeConcern of collMod operation for unique index upgrade.



 Comments   
Comment by Githook User [ 10/Oct/18 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-36478 Use user-provided wtimeout for setFCV command
Branch: master
https://github.com/mongodb/mongo/commit/796d1d15226e546485d356f1c41d94e8d11281ca

Comment by Eric Milkie [ 29/Aug/18 ]

My decision is that we should use the user-provided timeout for all write concern waits throughout a setFCV command, and that we should continue to honor the user's requested write concern at the conclusion of the setFCV command, as we currently do today. We should also change the behavior of the setFCV command so that it returns ok:false if a write concern timeout occurs during the run of the command.

Comment by Ian Whalen (Inactive) [ 24/Aug/18 ]

Assigning to Eric in the next sprint to discuss with others and make a decision on whether we want to stop setFCV from taking a writeconcern timeout or propagate the writeconcern error as Judah suggests.

Comment by Ian Whalen (Inactive) [ 17/Aug/18 ]

Discussed in NYC triage meeting but there's still some discussion about what's involved here. Leaving for further discussion when milkie is back.

Comment by Alexander Gorrod [ 17/Aug/18 ]

The storage engines team did triage this, but decided to leave it for New York. We believe we know what work is required - so it's a matter of deciding about scheduling now.

Comment by Judah Schvimer [ 08/Aug/18 ]

We've had different versions of this discussion over the years. Other relevant tickets include SERVER-31528 when the replication team decided setFCV should not accept a write concern, and then SERVER-31866 when the sharding team made the opposite decision for retryability and used the user provided wtimeout at the end of the command, but not throughout.

With that context, it seems reasonable to me that setFCV would propagate the user provided wtimeout throughout all write concern waits. That being said, right now WriteConcernErrors are not the same as command errors. Since we currently claim that once setFCV returns success the upgrade is durable, it feels important to me that any WriteConcernError is upgraded to a command error (there may currently be a bug here in SERVER-31866 for setFCV retryability).

Comment by Benety Goh [ 08/Aug/18 ]

We should get some input from tess.avitabile and judah.schvimer.

Comment by Neha Khatri [ 08/Aug/18 ]

Conversation from email:

neha.khatri said:

I am working on SERVER-34839 which is about ensuring that collMod's writeConcern=majority waits are honored during fCV upgrade. So it is expected that if the upgrade collMod command fails due to writrConcern Timeout then setFCV should also timeout. 
 
I had a discussion about this test with Maria and Alex. During that discussion we discovered that upgrade collMod command can never timeout because its writeConcern timeout is INT_ MAX. Whereas the writeConcern of setFCV is configurable. Also, the FCV document update has its own writeConcern. So we started wondering if that is correct. setFCV and the commands within it are not maintaining any consistency for the respective writeConcerns.
 
With your experience in replication, we would like you to help us draw out this dependency chain and define the ideal writeConcern behaviour for setFCV.

benety.goh replied:

This sounds like the same class of problems raised in:

https://jira.mongodb.org/browse/SERVER-34776

Perhaps this is best discussed in a Needs Triage SERVER ticket.

Generated at Thu Feb 08 04:43:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.