[SERVER-84439] appendMajorityWriteConcern should only override wtimeout if it is non-zero Created: 28/Dec/23  Updated: 05/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-84356 Make Sharding and User Command Write ... Backlog
is related to SERVER-84443 Bring back no timeout write concern f... Needs Scheduling
is related to SERVER-70192 Rewrite uses of sendCommandToShards i... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Service Arch Prioritized List
Participants:

 Description   

wtimeout of zero has a special meaning that it's no timeout. So, logically it has a greater value than anything, but we use numerical comparisons in the places below:

https://github.com/10gen/mongo/blob/231d6367ea3c7670a458ce00252d970979d7f5be/src/mongo/executor/async_rpc_util.h#L74
https://github.com/10gen/mongo/blob/231d6367ea3c7670a458ce00252d970979d7f5be/src/mongo/db/commands.cpp#L509

So, it ends up being treated as a smaller number. As a result, a no timeout setting can be overwritten with a smaller timeout.



 Comments   
Comment by Alex Li [ 05/Jan/24 ]

To summarize context: The recent changes of the appendMajorityWriteConcern method itself preserved the logic that it had previously. The sendCommandToShards refactor changed using write concern without a timeout field ({ w: "majority" }) to using the file local kMajorityWriteConcern ({ w: "majority", wtimeout: 0 }). But because the appendMajorityWriteConcern method is used, the resharding local kMajorityWriteConcern is used, but the timeout field is rewritten to a default minimum ({ w: "majority", wtimeout: 60000 }). The appendMajorityWriteConcern method was not used before the refactor.

To restate fixes:

1) Add to the minimum timeout if condition that if the timeout is 0, do not overwrite.

2) It was discussed that creating a new Timeout type may not fit other areas and could be overkill, but considering the wTimeout has both kNoTimeout{0} and kNoWaiting{-1}, it may be worth creating a WriteConcernOptions nested type to make no timeout and no waiting more obvious.

3) May be more work but IMO it could be worth changing this function's logic to be more simple (refactor other current uses as well).

Comment by Randolph Tan [ 29/Dec/23 ]

What if we create a new type Timeout<Duration> that has the method hasTimeout and overloads the comparison operators so we won't make mistakes like this in the future?

Generated at Thu Feb 08 06:55:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.