[SERVER-48387] DB connection issue Created: 22/May/20  Updated: 27/Oct/23  Resolved: 30/Jun/20

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alberto Llamas Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

We have a mongo cluster version 4.2.1 In our Arbiter node we are currently getting this error/warning in the logs:

 

2020-05-22T14:57:02.573+0000 I  CONTROL  [LogicalSessionCacheReap] Failed to reap transaction table: HostUnreachable: network error while attempting to run command 'ismaster' on host 'mongo-node-4:27017' 
2020-05-22T14:57:02.592+0000 I  NETWORK  [LogicalSessionCacheReap] Successfully connected to mongo-node-4:27017 (2 connections now open to mongo-node-4:27017 with a 0 second timeout)
2020-05-22T14:57:08.717+0000 I  NETWORK  [LogicalSessionCacheRefresh] DBClientConnection failed to receive message from -mongo-node-4:27017 - SocketException: Connection timed out
2020-05-22T14:57:08.717+0000 I  NETWORK  [LogicalSessionCacheRefresh] Detected bad connection created at 1590158176370506 microSec, clearing pool for mongo-node-4:27017 of 1 connections
2020-05-22T14:57:08.717+0000 I  NETWORK  [LogicalSessionCacheRefresh] Dropping all pooled connections to mongo-node-4:27017(with timeout of 0 seconds)
2020-05-22T14:57:08.717+0000 I  NETWORK  [LogicalSessionCacheRefresh] Ending connection to host -mongo-node-4:27017(with timeout of 0 seconds) due to bad connection status; 0 connections to that host remain open
2020-05-22T14:57:08.717+0000 I  CONTROL  [LogicalSessionCacheRefresh] Sessions collection is not set up; waiting until next sessions refresh interval: network error while attempting to run command 'ismaster' on host '-mongo-node-4:27017' 
2020-05-22T14:57:08.718+0000 I  NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for -rs/-mongo-node-3:27017,-mongo-node-4:27017,-mongo-node-5.k12tech.eu:27017,-mongo-node-6.k12tech.eu:27017

 

We are wonder if it could be any connectivity issues between primary and arbiter node?

 

 

 



 Comments   
Comment by Dmitry Agranat [ 30/Jun/20 ]

Hi albertollamaso@gmail.com,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,

Comment by Dmitry Agranat [ 07/Jun/20 ]

Hi albertollamaso@gmail.com, thank you for providing the requested information.

Based on the "TCPAbortOnTimeout" errors we see, I suspect there is some kind of network issue / configuration in this deployment. However, given the diagnostic.data uploaded for node-5 and node-6 only contain 1 minute of captured data, this is just assumption based on the limited data from the Arbiter.

I recommend upgrading MongoDB to 4.2 latest (which is 4.2.7 as of today), collecting the same data again (but this time the full archive of diagnostic.data) and revisiting this analysis.

Thanks,
Dima

Comment by Alberto Llamas [ 01/Jun/20 ]

Thanks Dmitry. I've uploaded a file: Archive 2.zip with the diagnostic.data directory.

 

Alberto,

Comment by Dmitry Agranat [ 31/May/20 ]

Thanks albertollamaso@gmail.com, the uploaded archive does not include the diagnostic.data directory. Could you upload it so that we'll be able to investigate?

Comment by Alberto Llamas [ 27/May/20 ]

Hi Dmitry,

 

Thanks for your response. I've uploaded the mongodb logs for the 3 replica sets we have in the cluster (primary+secondary+arbiter). File:
Archive.zip
 
Please let me know your thoughts.

Comment by Dmitry Agranat [ 27/May/20 ]

Hi albertollamaso@gmail.com, thank you for the report.

Apart from this informational log line, is there any impact from the end-user perspective? If there is (please describe it), would you archive (tar or zip) the mongod.log files and the $dbpath/diagnostic.data directory (the contents are described here) from all members of this replica set and upload them to this support uploader location?

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Generated at Thu Feb 08 05:16:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.