[SERVER-82550] MongoDB 7.0: socketTimeout not being honored Created: 29/Oct/23 Updated: 02/Feb/24 |
|
| Status: | Investigating |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Johnny Shields | Assignee: | Rushan Chen |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Operating System: | ALL |
| Sprint: | QE 2024-01-22, QE 2024-02-05, QE 2024-02-19 |
| Participants: | |
| Case: | (copied to CRM) |
| Description |
|
(This ticket is related to support case https://support.mongodb.com/case/01207868) In our Ruby/Mongoid-based application, we set an application-wide socketTimeout value of 30 seconds. In MongoDB 6.x and prior, this was a reliable way that slow queries would be killed when the socket closed. Since upgrading to MongoDB 7.0.2 from 6.0.11, we have noticed many long-running queries are not killed in this manner, and instead execute indefinitely. The MongoDB team has suggested that this may be due to the new SBE in MongoDB 7.x “yielding less frequently” to evaluate the timeout condition. Moreover, we suspect there is a “death spiral” effect whereby non-killed long running queries cause less yields, causing more queries to escape the timeout kill, and so-on in a positive feedback loop. (Just a suspicion; no hard evidence for this.) |
| Comments |
| Comment by Johnny Shields [ 30/Jan/24 ] |
|
Unfortunately no, it was 3-4 months ago. I would suggest you try to reproduce by setting socketTimeoutMS=5000 value and bombarding the database with long running queries-they should all timeout after 5 sec. If any do not timeout after 5 sec, congratulations--you have reproduced the bug. I would also recommend to test this with the SBE fully enabled, e,g. version 7.0.0. (I think later 7.0.x disabled the SBE in some cases.) |
| Comment by Rushan Chen [ 30/Jan/24 ] |
|
Hi shields@tablecheck.com sorry for getting to this late. As the support case is some time back, would it be possible to get some update from you on this issue: (1) any query logs you have related to this problem ? (2) or otherwise more details (eg the mix of queries running) that could allow us to try to reproduce the issue internally? |
| Comment by Johnny Shields [ 29/Oct/23 ] |
|
One more note, we are aware that maxTimeMs is probably more reliable than socketTimeout here, however, it isn't robustly supported in the driver/ODM yet (see MONGOID-5480 / MONGOID-5481). |