[SERVER-55802] Mongos should respect the client Op timeout without relying on mongod to do so Created: 05/Apr/21 Updated: 06/Dec/22 Resolved: 11/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.23 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Andrew Shuvalov (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Service Arch
|
||||||||
| Participants: | |||||||||
| Description |
|
In the HELP ticket repro, artificial fault injection was made to simulate disk error on mongod. While mongod was not capable to respect the operation timeout because the thread was blocked indefinitely on disk operation. Additional fault injection was made to simulate the operation timeout at mongos and that resulted in much slower connection buildup than without timing out the operations. It should be assumed that interrupting a thread stuck waiting on socket reply should me much easier than interrupting the thread stuck on faulty disk I/O, because the TCP connection is still perfectly healthy. In Enterprise binaries, the problem of faulty disk on mongos is solved by Watchdog. |
| Comments |
| Comment by Lamont Nelson [ 11/May/21 ] | ||||||||
|
I'm going to close this ticket, if there is a particular use case where maxtimems doesn't work we can provide a test case and reopen. | ||||||||
| Comment by Lamont Nelson [ 11/May/21 ] | ||||||||
|
I'm not sure that I understand this statement. The bson api is the interface to run commands on mongodb. Everything else is just syntactic sugar provided (or not) by the drivers in their host language. | ||||||||
| Comment by Andrew Shuvalov (Inactive) [ 13/Apr/21 ] | ||||||||
|
Yes, maxTimeMS works in raw runCommands but I really don't want to steer users in that direction, it may create compatibility hurdles. At least, we need to wait what design comes out from DRIVERS-555 to have more consistent long term strategy. In medium term, this problem should be partially mitigated by mongod-side implementation of my thread liveness monitor proposal PM-2248. | ||||||||
| Comment by Lamont Nelson [ 13/Apr/21 ] | ||||||||
|
Have you actually tried to attach maxTimeMs to the raw command and it didn't work? Meaning the BSON representation, not a command through the strongly typed api. | ||||||||
| Comment by Lamont Nelson [ 10/Apr/21 ] | ||||||||
|
I think that in order to enforce maxTimeMS for writes using the Java driver, we need to use the interface that lets you submit a raw BSON command (MongoDatabase.runCommand; equivalent to what this jstest is doing ) versus their strongly typed api. I'm not sure why this is, but I've verified with Jeff Yemin that this is the case. | ||||||||
| Comment by Andrew Shuvalov (Inactive) [ 08/Apr/21 ] | ||||||||
|
Yes, I was able to reproduce that the timeout in find() operation is respected by mongos and it will release the thread, and both connections. However, as discussed above, most other operations don't have a timeout mongos can handle. | ||||||||
| Comment by Andrew Shuvalov (Inactive) [ 08/Apr/21 ] | ||||||||
|
There is also DRIVERS-555 "Client side operations Timeout" with the notion that MaxTimeMS will be deprecated and replaced with unified timeoutMS. Sop I'm changing this to blocked on DRIVERS-555. | ||||||||
| Comment by Andrew Shuvalov (Inactive) [ 07/Apr/21 ] | ||||||||
|
There is an open | ||||||||
| Comment by Andrew Shuvalov (Inactive) [ 07/Apr/21 ] | ||||||||
|
I don't see any way to set MaxTimeMS for updateOne(), updateMany() and similar operations in Java driver, maybe because | ||||||||
| Comment by Matthew Saltz (Inactive) [ 07/Apr/21 ] | ||||||||
|
There's a difference between wtimeout and MaxTimeMS. The way I understand it, wtimeout only applies to the portion of the query that waits for write concern, and does not work as an overall operation timeout. So the query you're issuing doesn't actually have an overall operation timeout on it. I think if you set maxtimems then you'll see the behavior you're requesting where the mongos will stop waiting for the query to complete after the operation deadline. (On 4.0.23 and later, that is.) | ||||||||
| Comment by Andrew Shuvalov (Inactive) [ 06/Apr/21 ] | ||||||||
|
I don't think the fixes above address the update operation timeout. However please correct me if my expectations on the way I'm setting the write timeout is expected to do what I want:
I just repeated the experiment with the head, future v5.0 release: Here the fault injection to block any disk ops on primary are blocked at A. The immediate connection buiildup starts and none of 'updateOne' commands ever times out. |