[SERVER-46563] SIGUSR2 does not work with --fork Created: 03/Mar/20  Updated: 29/Oct/23  Resolved: 16/Mar/20

Status: Closed
Project: Core Server
Component/s: Diagnostics
Affects Version/s: None
Fix Version/s: 4.4.0-rc0, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-47478 Missing "forked process: <pid>" messa... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Dev Tools 2020-03-09, Dev Tools 2020-03-23
Participants:

 Description   

Downloaded latest Ubuntu 18.0.4 build this morning from https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1804-latest.tgz, which unpacks to mongodb-linux-x86_64-ubuntu1804-4.3.3-713-g56655b0 and identifies itself in the log file as having been built from a commit made yesterday:

{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":20719,"ctx":"initandlisten","msg":"{mongodVersion_vii}","attr":{"mongodVersion_vii":"db version v4.3.3-713-g56655b0"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23399,"ctx":"initandlisten","msg":"git version: {gitVersion}","attr":{"gitVersion":"56655b06ac46825c5937ccca5947dc84ccbca69c"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23400,"ctx":"initandlisten","msg":"{openSSLVersion_OpenSSL_version}","attr":{"openSSLVersion_OpenSSL_version":"OpenSSL version: OpenSSL 1.1.1  11 Sep 2018"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23401,"ctx":"initandlisten","msg":"allocator: {allocator}","attr":{"allocator":"tcmalloc"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23402,"ctx":"initandlisten","msg":"{ss_str}","attr":{"ss_str":"modules: none"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23403,"ctx":"initandlisten","msg":"build environment:"}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23404,"ctx":"initandlisten","msg":"    {std_get_0_envDataEntry}: {std_get_1_envDataEntry}","attr":{"std_get_0_envDataEntry":"distmod","std_get_1_envDataEntry":"ubuntu1804"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23404,"ctx":"initandlisten","msg":"    {std_get_0_envDataEntry}: {std_get_1_envDataEntry}","attr":{"std_get_0_envDataEntry":"distarch","std_get_1_envDataEntry":"x86_64"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":23404,"ctx":"initandlisten","msg":"    {std_get_0_envDataEntry}: {std_get_1_envDataEntry}","attr":{"std_get_0_envDataEntry":"target_arch","std_get_1_envDataEntry":"x86_64"}}
{"t":{"$date":"2020-03-03T07:56:09.297-0500"},"s":"I", "c":"CONTROL", "id":51765,"ctx":"initandlisten","msg":"operating system: {name}, version: {version}","attr":{"name":"Ubuntu","version":"18.04"}}

Tried SIGUSR2 and then 10 seconds later SIGSEGV to confirm that I was actually sending a signal

> date; killall -SIGUSR2 mongod; sleep 10; date; killall -SIGSEGV mongod
Tue Mar  3 08:15:37 EST 2020
Tue Mar  3 08:15:47 EST 2020

Log shows that the SIGSEGV at 08:15:47 worked but the SIGUSR2 10 seconds earlier at 08:15:37 had no effect

{"t":{"$date":"2020-03-03T08:14:04.956-0500"},"s":"I", "c":"INDEX",   "id":20345,"ctx":"LogicalSessionCacheRefresh","msg":"index build: done building index {indexName} on ns {nss}","attr":{"indexName":"lsidTTLIndex","nss":"config.system.sessions"}}
{"t":{"$date":"2020-03-03T08:15:47.686-0500"},"s":"F", "c":"-",       "id":0,"ctx":"initandlisten","msg":"{}","attr":{"message":"Invalid access at address: 0x3e800006cb0"}}
{"t":{"$date":"2020-03-03T08:15:47.686-0500"},"s":"F", "c":"-",       "id":0,"ctx":"initandlisten","msg":"{}","attr":{"message":"Got signal: 11 (Segmentation fault)."}}

I've also tried on a recent MacOS build and couldn't get SIGUSR2 to work.



 Comments   
Comment by Duncan Armstrong [ 11/Jun/20 ]

If you're backporting, don't forget to also backport this related bug fix, or you'll break Cloud automation: SERVER-47478

Comment by Githook User [ 06/Apr/20 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-46563 use pipe instead of SIGUSR2 for interfork comms

(cherry picked from commit 1ed541a15b22e3aea71e5c1efed421353762aa20)
Branch: v4.4
https://github.com/mongodb/mongo/commit/26cf6ae414662bc2e5af79c0c7cf4601849acbab

Comment by Githook User [ 16/Mar/20 ]

Author:

{'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-46563 use pipe instead of SIGUSR2 for interfork comms
Branch: master
https://github.com/mongodb/mongo/commit/1ed541a15b22e3aea71e5c1efed421353762aa20

Comment by Billy Donahue [ 09/Mar/20 ]

CR http://mongodbcr.appspot.com/584080001

Comment by Billy Donahue [ 04/Mar/20 ]

I looked into this a bit.

Looking at db/db.cpp, the mongoDbMain function.

...
setupSignalHandlers()
runGlobalInitializers() // ForkServer among them.
startSignalProcessingThread()
...

We double-fork the server in a mongo initializer, and part of that process involves throwing a SIGUSR2 from child to parent to give the parent permission to die. masks and handlers are modified to do this.

I'm thinking there could be some SIGUSR2 interaction in this setup sequencing. Unclear to me exactly what the problem is, but that use of SIGUSR2 can be replaced with a write on a pipe instead. I think that's a more safe and conventional way to communicate across fork.

Comment by Bruce Lucas (Inactive) [ 03/Mar/20 ]

Looks like it works if you don't specifiy --fork but does not work if you do.

The js test from that build is here. It succeeded, but the test does not use --fork.

 

Comment by Bruce Lucas (Inactive) [ 03/Mar/20 ]

I think it came from here: https://evergreen.mongodb.com/build/mongodb_mongo_master_ubuntu1804_56655b06ac46825c5937ccca5947dc84ccbca69c_20_03_02_03_34_34. Possibly a build issue?

Comment by Billy Donahue [ 03/Mar/20 ]

mac is not expected to work at all.

I don't know where that fastdl package came from but the builds of mongodb-mongo-master is working.
This sounds like a packaging/distribution issue.

Generated at Thu Feb 08 05:11:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.