[SERVER-58114] mongos crashed after 180 days Created: 28/Jun/21  Updated: 27/Oct/23  Resolved: 04/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: li zhang Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: Bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File config_server_01.log     Text File config_server_02.log     Text File config_server_03.log     Text File config_server_04.log     Text File config_server_05.log     Text File config_server_06.log     Text File mongos01.log     Text File mongos02.log    
Operating System: ALL
Participants:

 Description   

crash log:

CONTROL [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends

 

Mongo version:

MongoDB shell version v4.2.11
git version: ea38428f0c6742c7c2c7f677e73d79e17a2aab96
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: rhel70
distarch: x86_64
target_arch: x86_64

 

Linux version:

Linux version 3.10.0-1160.6.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Tue Nov 17 13:59:11 UTC 2020



 Comments   
Comment by Dmitry Agranat [ 04/Jul/21 ]

This is not enough information to determine what external service issued SIGTERM. I suggest looking at syslog, messages, dmesg to try to figure this out.

As this does not seem to be an issue related to MongoDB, I will go ahead and close this case.

Regards,
Dima

Comment by li zhang [ 30/Jun/21 ]

Thanks Dima!

Maybe mongo got signal 15 from itself, there's something in detail,I uploaded new logs.

we have 6 config server and 2 mongos, some servers got signal 15.

server got signal 15(GMT+8) ip
mongos02 2021-06-10 13:34:29 10.27.0.16
config_sever05 2021-06-10 14:13:27 10.27.0.15
mongos01 2021-06-10 14:14:04 10.27.0.15
config_server06 2021-06-10 14:26:39 10.27.0.16
config_server01 2021-06-10 14:46:01 10.27.0.11

 

other servers are ok, our app is ok until 2021-06-10 19:08:46:666 Timed out after 30000 ms while waiting for a server that matches com.mongodb.client.internal.MongoClientDelegate$1@28afd302. Client view of cluster state is {type=SHARDED, servers=[{address=10.27.0.16:23001, type=UNKNOWN, state=CONNECTING, exception=

{com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}

, caused by {java.net.SocketTimeoutException: Read timed out}}, {address=10.27.0.15:23001, type=UNKNOWN, state=CONNECTING, exception=

{com.mongodb.MongoSocketReadTimeoutException: Timeout while receiving message}

, caused by {java.net.SocketTimeoutException: Read timed out}}].

 

I am sure the mongos process is still alive,we can connect it but  it can not service  after 2021-06-10 19:08:04 GMT+8 (HMAC Key expiresAt  1623323284).

 

configs:PRIMARY> db.system.keys.find()

{ "_id" : NumberLong("6905325084227928095"), "purpose" : "HMAC", "key" : BinData(0,"KkNA0g9Nkevn1T6KF4CRCfUNAfU="), "expiresAt" : Timestamp(1615547284, 0) } { "_id" : NumberLong("6905325084227928096"), "purpose" : "HMAC", "key" : BinData(0,"4Rx9aW2uxjfKwG3pztFPfw4HqVg="), "expiresAt" : Timestamp(1623323284, 0) }

 

Thanks,

li zhang

 

Comment by Dmitry Agranat [ 29/Jun/21 ]

Hi changein2013@sina.com,

This is not a crash - a SIGTERM (signal 15) means that something is killing the mongod process and it is exiting normally (you can do this manually by using the kill command from the console/command line). The real question is: what is sending the signal to the mongod process? Unfortunately, with the information you have provided, we cannot determine the source from the logs, and it can be quite difficult to do so in general. We suggest checking for any external process monitoring, or watchdog-type processes on this server that might be killing mongod.

Thanks,
Dima

Generated at Thu Feb 08 05:43:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.