[GODRIVER-2525] Occasional handshake error when using mongodb+srv with mongos pool Created: 16/Aug/22 Updated: 27/Oct/23 Resolved: 23/Dec/22 |
|
| Status: | Closed |
| Project: | Go Driver |
| Component/s: | Connections, Error Handling |
| Affects Version/s: | 1.9.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Unknown |
| Reporter: | Peter Ivanov | Assignee: | Benji Rewis (Inactive) |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Description |
| Comments |
| Comment by PM Bot [ 23/Dec/22 ] | |
|
There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information. | |
| Comment by Benji Rewis (Inactive) [ 08/Dec/22 ] | |
|
The Go driver team does not feel that allowing a configurable rescanSRVInterval is a great fix for this situation. While we believe that "knob" does reduce the number of SRV lookup errors, we also think that GODRIVER-2579 will almost entirely remove the possibility of errors like the ones you're seeing being raised to users. We'd rather not expose new API to users to help avoid odd driver behavior, as that API will likely become irrelevant and permanent (removing it post hoc would be backward-breaking) after we've fixed the odd driver behavior. If you're intent on using a reduced SRV rescan interval, we may ask you to rely on your fork of v1.9.4 for the time being. | |
| Comment by Artem Navrotskiy [ 07/Dec/22 ] | |
I don't see this as a problem. If you don't explicitly need to change this parameter, then just leave default value. In the current situation, even in order to just look at the interval, you need to get into the code.
After reducing the interval from 60 seconds to 30, the error still remained, but the probability of its occurrence decreased several times (about 5 times, but I don't remember the exact numbers).
Now we use version 1.9.4 with changes from PR. | |
| Comment by Benji Rewis (Inactive) [ 05/Dec/22 ] | |
|
Hello again, petr.ivanov.s@gmail.com. I'm following up on this ticket, as there seems to be an open PR related to this issue. Is the author someone from your team? While making the SRV rescan interval configurable may feasibly solve this issue for you all, we're hesitant to introduce a new "knob" to the driver: adding a URI option/client option for rescanSRVIntervalMS would be a cross-drivers change, and it may be difficult for most users to reason about which value to use for rescanSRVIntervalMS. We have an upcoming change GODRIVER-2579/GODRIVER-2191 (retrying operations if the connection handshake fails) that would probably stop these SRV lookup errors from bubbling up to your application. We would simply retry the handshake, and the retry would probably succeed given the sequence of events matt.dale@mongodb.com describes in his comment. I have three questions:
| |
| Comment by PM Bot [ 16/Nov/22 ] | |
|
There hasn't been any recent activity on this ticket, so we're resolving it. Thanks for reaching out! Please feel free to comment on this if you're able to provide more information. | |
| Comment by Matt Dale [ 01/Nov/22 ] | |
|
petr.ivanov.s@gmail.com we recently discovered a bug in the SRV polling behavior of the Go Driver that may prevent changes in SRV records from updating the servers that the Go Driver attempts to connect to when the MongoDB connection string includes a username and password (see Do you use a username and password in your MongoDB connection string? If so, please update to one of the fix versions listed above as soon as they are available and see if that prevents or reduces the handshake errors you see. Since you're using 1.9.1, I recommend updating to version 1.9.3 since it will be the least risky change. As far as server behavior, MongoDB 5.0 added a "quiesce" mode that's used during shutdown to allow connected drivers to gracefully remove the shutting down servers (read more about quiesce mode here). If updating to a patched Go Driver version doesn't help, updating to MongoDB 5.0 may help. | |
| Comment by Peter Ivanov [ 25/Oct/22 ] | |
|
Question 1: no, we use pretty much bare bone cluster on AWS EC2 instances. Mongos-es are run in Kubernetes and scale according to load. For question 2, I'll as a colleague to answer, but it's worth noting that we have MongoDB 4.4, and shutdown handling may have improved since then. But the issue may not be with graceful shutdown alone. | |
| Comment by Matt Dale [ 13/Oct/22 ] | |
|
Hey petr.ivanov.s@gmail.com, sorry about the slow reply. I've been attempting to reproduce the error you described but have so far been unsuccessful. However, I have a possible sequence of events that could lead to the error:
Based on that, I have a few more questions:
| |
| Comment by Peter Ivanov [ 29/Aug/22 ] | |
| |
| Comment by Matt Dale [ 26/Aug/22 ] | |
|
Hey petr.ivanov.s@gmail.com thanks for the ticket, we're looking into it! I've got a few questions to help me troubleshoot the issue:
|