[SERVER-34120] scoped connection not being returned to the pool Created: 24/Mar/18 Updated: 08/Jan/24 Resolved: 05/Sep/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Networking |
| Affects Version/s: | 3.6.3 |
| Fix Version/s: | 3.6.8, 4.0.3, 4.1.3 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Mouhcine | Assignee: | Mira Carey |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||||||||||||||||||||||
| Sprint: | Platforms 2018-07-02, Platforms 2018-07-16, Platforms 2018-07-30, Platforms 2018-08-13, Platforms 2018-09-10 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||
| Description |
|
I have a replicaSet with two servers and I start getting this "scoped connection not being returned to the pool" when I upgraded to 3.6.3 in every 5 minute... My servers are setup with sync NTP and I m sure they have the sync time correctly ANY idea what it could cause this... Thank you
|
| Comments |
| Comment by Githook User [ 05/Sep/18 ] | |||||||||||
|
Author: {'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}Message: We make connections from the secondaries to the primary that we fail to (cherry picked from commit fac4037c57431e3318e04cc850b9ea391ff5f19c) | |||||||||||
| Comment by Githook User [ 05/Sep/18 ] | |||||||||||
|
Author: {'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}Message: We make connections from the secondaries to the primary that we fail to (cherry picked from commit fac4037c57431e3318e04cc850b9ea391ff5f19c) | |||||||||||
| Comment by Githook User [ 05/Sep/18 ] | |||||||||||
|
Author: {'name': 'Jason Carey', 'email': 'jcarey@argv.me', 'username': 'hanumantmk'}Message: We make connections from the secondaries to the primary that we fail to | |||||||||||
| Comment by Mira Carey [ 28/Aug/18 ] | |||||||||||
|
After looking at the solution we pursued in | |||||||||||
| Comment by Mira Carey [ 17/Aug/18 ] | |||||||||||
|
As part of the fix for | |||||||||||
| Comment by RANJEET [ 02/Aug/18 ] | |||||||||||
|
I am using mongo version 3.6.5 and getting similar messages ....
018-08-02T09:55:39.808-0400 I NETWORK [thread64] Starting new replica set monitor for ReplicaSet/host1:27017,host2:27017,host3:27017
| |||||||||||
| Comment by Madhu N [ 02/Aug/18 ] | |||||||||||
|
I Have same issue in my environment.
I just stopped the services and added some additional disk. restarted the services with new path after copying files to new disk. Some time after starting it is going to be shutdown. Logs here: Thu Aug 2 11:38:57.257 I NETWORK [thread10] Primary-one:27017 (88 connections now open to primary-one:27017 with a 0 second timeout) Did we find any fix for this?! Best Regards
| |||||||||||
| Comment by Mira Carey [ 21/Jun/18 ] | |||||||||||
|
After looking a bit, I believe this is actually related to some of the Logical Sessions code introduced in 3.6. Following the repro (spinning up a minimal 3 node replset) and watching the logs, threads with the name "threadN" pop up with similar warnings. Attaching gdb showed these to be PeriodicRunnerASIO threads running background tasks for the logical sessions refresh. The relevant code lives at: sessions_collection_rs.cpp
The mistake we're making is in not calling done() on the connection before letting the ScopedDbConnection go out of scope. When connects are returned in that way, we don't know their status, so we have to toss them. Calling done() fixes the problem and uses pooled connections for these operations. I don't believe this is a critical operational issue. That codepath does clean up those connections (we don't leak them), we just lose track of some book keeping. Definitely worth fixing, but probably not a cause of production instability | |||||||||||
| Comment by Erik Wramner [ 16/May/18 ] | |||||||||||
|
The servers haven't crashed since I wrote the comment. I will create a new ticket on the next crash. I upgraded all packages (yum upgrade) and rebooted the servers last time, so perhaps that helped? However, the noise in the logs about connections not being returned to the pool with ever increasing numbers of connections "now open" is still there. | |||||||||||
| Comment by Kelsey Schubert [ 11/May/18 ] | |||||||||||
|
erik.wramner, would you please open a new SERVER ticket for the issue that you describe? It would be helpful for us to see the mongod logs and an archive of diagnostic.data directory of the mongod that is crashing. Thank you, | |||||||||||
| Comment by Erik Wramner [ 11/May/18 ] | |||||||||||
|
We are getting this on 3.6.4-1 on CentOS 7 kernel 3.10.0-693.21.1.el7.x86_64 in Azure. As far as I can determine we had it with 3.6.2 as well. Starting recently we also have regular server crashes where the mongod process dies without any clear reason on at least a daily basis. I don't know if it is related in any way, but apart from the sudden death this is what I can find in the logs. We have three servers in the replica set and the logs are just like the ones above. I would be very interested in getting this fixed. | |||||||||||
| Comment by Mouhcine [ 26/Mar/18 ] | |||||||||||
|
Hello Ramon, | |||||||||||
| Comment by Ramon Fernandez Marina [ 26/Mar/18 ] | |||||||||||
|
mmakroume, these messages are informational; they indicate that a connection cannot be reused and is being terminated. What version did you upgrade from? Also it would be good to see more details from both nodes, to see if they provide a clue as of to why this is happening. Can you please upload logs for both nodes in your replica set spanning enough time to show multiple incidents of these messages? Thanks, |