[SERVER-79888] find/getMore timeouts triggered by Fetcher always reported as NetworkInterfaceExceededTimeLimit Created: 09/Aug/23 Updated: 05/Feb/24 |
|
| Status: | In Progress |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Patrick Freed | Assignee: | Janna Golden |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | sharding-nyc-subteam2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Cluster Scalability
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Cluster Scalability 2023-12-25, Cluster Scalability 2024-1-8, Cluster Scalability 2024-1-22, Cluster Scalability 2024-2-5, Cluster Scalability 2024-2-19 | ||||||||||||
| Participants: | |||||||||||||
| Story Points: | 5 | ||||||||||||
| Description |
|
The constructor of Fetcher accepts two timeout values: findNetworkTimeout and getMoreNetworkTimeout. Despite their names, these timeouts govern the entire find/getMore command invocations, not just the network portions. In at least a few cases, these timeouts are derived from a maxTimeMS value, but because they are passed to the Fetcher on their own and not attached to an opCtx, they lose their identities as maxTimeMS values. As a result, if either timeout is hit during the fetching process, a NetworkInterfaceExceededTimeLimit error will be returned (the default defined on RemoteCommandRequest) rather than a MaxTimeMSExpired error, as expected. A potential negative side effect of this would be a process' replica set monitor marking the host as unreachable, when in reality a fetching query just took longer than expected. An example of this issue can be seen in ShardRemote::_runExhaustiveCursorCommand, which manually extracts the remaining maxTimeMS value from the opCtx and then passes it into a Fetcher. https://github.com/mongodb/mongo/blob/r7.0.0/src/mongo/s/client/shard_remote.cpp#L303-L310 https://github.com/mongodb/mongo/blob/r7.0.0/src/mongo/client/fetcher.cpp#L200-L201
|