[SERVER-82550] MongoDB 7.0: socketTimeout not being honored Created: 29/Oct/23  Updated: 02/Feb/24

Status: Investigating
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Johnny Shields Assignee: Rushan Chen
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Operating System: ALL
Sprint: QE 2024-01-22, QE 2024-02-05, QE 2024-02-19
Participants:
Case:

 Description   

(This ticket is related to support case https://support.mongodb.com/case/01207868)

In our Ruby/Mongoid-based application, we set an application-wide socketTimeout value of 30 seconds. In MongoDB 6.x and prior, this was a reliable way that slow queries would be killed when the socket closed.

Since upgrading to MongoDB 7.0.2 from 6.0.11, we have noticed many long-running queries are not killed in this manner, and instead execute indefinitely.

The MongoDB team has suggested that this may be due to the new SBE in MongoDB 7.x “yielding less frequently” to evaluate the timeout condition.

Moreover, we suspect there is a “death spiral” effect whereby non-killed long running queries cause less yields, causing more queries to escape the timeout kill, and so-on in a positive feedback loop. (Just a suspicion; no hard evidence for this.)



 Comments   
Comment by Johnny Shields [ 30/Jan/24 ]

Unfortunately no, it was 3-4 months ago. I would suggest you try to reproduce by setting socketTimeoutMS=5000 value and bombarding the database with long running queries-they should all timeout after 5 sec. If any do not timeout after 5 sec, congratulations--you have reproduced the bug.

I would also recommend to test this with the SBE fully enabled, e,g. version 7.0.0. (I think later 7.0.x disabled the SBE in some cases.)

Comment by Rushan Chen [ 30/Jan/24 ]

Hi shields@tablecheck.com sorry for getting to this late. As the support case is some time back, would it be possible to get some update from you on this issue: (1) any query logs you have related to this problem ? (2) or otherwise more details (eg the mix of queries running) that could allow us to try to reproduce the issue internally?
We might have to close this ticket if we do not have details to further investigate.

Comment by Johnny Shields [ 29/Oct/23 ]

One more note, we are aware that maxTimeMs is probably more reliable than socketTimeout here, however, it isn't robustly supported in the driver/ODM yet (see MONGOID-5480 / MONGOID-5481).

Generated at Thu Feb 08 06:49:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.