[SERVER-46187] MongoS doesn't pass maxTimeMS to shards for write commands Created: 14/Feb/20 Updated: 29/Oct/23 Resolved: 31/Aug/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Spencer Brody (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: |
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | Service arch 2020-05-04, Service arch 2020-05-18, Service arch 2020-06-01, Service arch 2020-06-15, Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10, Service Arch 2020-08-24, Service arch 2020-09-07 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||||||
| Description |
|
When running an insert through a mongoS it appears that a MaxTimeMSExpired error can be returned to the client even if the operation is still running on a shard mongod. This behavior seems unexpected and undesirable if a client expects that a maxTimeMS timeout error implies an operation has failed with no effect. |
| Comments |
| Comment by Githook User [ 15/Sep/20 ] |
|
Author: {'name': 'Spencer T Brody', 'email': 'spencer@mongodb.com', 'username': 'stbrody'}Message: |
| Comment by Githook User [ 30/Aug/20 ] |
|
Author: {'name': 'Spencer T Brody', 'email': 'spencer@mongodb.com', 'username': 'stbrody'}Message: |
| Comment by Githook User [ 30/Aug/20 ] |
|
Author: {'name': 'Spencer T Brody', 'email': 'spencer@mongodb.com', 'username': 'stbrody'}Message: This reverts commit 3e8fea7b03c7cb7fbed5e68714b0f3cfd663016c |
| Comment by Githook User [ 24/Apr/20 ] |
|
Author: {'name': 'Spencer T Brody', 'email': 'spencer@mongodb.com', 'username': 'stbrody'}Message: |
| Comment by Mira Carey [ 20/Apr/20 ] |
|
I've got a two part proposal here: 1. Map the deadline passed in via the RemoteCommandRequest object to a maxTimeMSOpOnly field on the actual request. This will ensure that clients that aren't intending on waiting don't start unlimited duration external requests This still leaves a hole around manual calls to scheduleRemoteCommand, but most sync command paths in mongos already use the ARS, including all writes. Separately, this does solidly change the contract of the ARS, because it makes the ARS unsuitable for fire and forget dispatch, but I'm already a little skeptical of it for that use case. Cancellation causes remote killop already hurts that use case, as does the reality that retries require an active thread calling done(). The one callsite we know of that uses that flow is: https://github.com/mongodb/mongo/blob/06f26fac35c7a3b82effbb9815ad9f7aedaf4dfb/src/mongo/s/transaction_router.cpp#L1295-L1311 We should investigate that use case to see if we can serve it better separately, and also to understand if the weakened contract is still good enough. |
| Comment by Mira Carey [ 18/Feb/20 ] |
Interesting. If that's the case, that's definitely a bug (and one we've attempted to stamp out in the past) |
| Comment by William Schultz (Inactive) [ 18/Feb/20 ] |
|
When this is fixed, we can unblacklist the tests from this commit. |
| Comment by William Schultz (Inactive) [ 14/Feb/20 ] |
In my simple insert test, it doesn't appear that we actually forward the maxTimeMS value through from a mongoS to a shard node. I would need to look around the code more to verify this but from looking at the command objects run on both the mongoS and mongoD this appears be the case. I'd agree that a guarantee of "if maxTimeMS has expired, then your operation will eventually fail (on the shard it's running on)" is somewhat reasonable, and better than "if maxTimeMS has expired, then your operation may or may not have failed." |
| Comment by Mira Carey [ 14/Feb/20 ] |
|
I think that this is expected/known (at least internally). We propagate maxtimems, but also enforce it on both mongos and mongod. Failing to do so would mean that would hang in scenarios where today we successfully return a timeout error. Looking at our docs for maxtimems()
I think we're still technically correct. We do actually pass maxtimems down and do interrupt the operation at the next interrupt point. It's just that there can be a small window where we've hit that point on a mongos, but not yet on a shard server. I don't see any language forming a causal edge between the error return and operation death. And in a world with retryable reads and writes, I think that more prompt enforcement of maxtimems is more valuable than guaranteeing operation cancellation, but I can understand the opposing view. |
| Comment by William Schultz (Inactive) [ 14/Feb/20 ] |
|
Maybe this is expected/known behavior but it seemed odd to me when I observed this behavior in a test. |