[SERVER-4949] SIGPIPE causing process termination on accept()ed connections (OSX) Created: 13/Feb/12  Updated: 11/Jul/16  Resolved: 02/Mar/12

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 2.0.2
Fix Version/s: 2.1.1

Type: Bug Priority: Minor - P4
Reporter: Ben Becker Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mac OS X 10.7.3


Attachments: File pinval_repro.php    
Operating System: OS X
Participants:

 Description   

It seems that inbound connections that terminate due to a broken pipe are not invoking the signal handler nor returning EPIPE on send() or recv(). This causes the entire process to die. Socket::connect() sets the SO_NOSIGPIPE socket option, however Listener::initAndListen() does not appear to set this option on the accept()ed socket. The following is reproducible (from gdb mongos) with a simple PHP script that queries for all results and severing the connection while exhausting the cursor:

Sun Feb 12 21:12:44 [conn1] end connection 127.0.0.1:52843
 
Program received signal SIGPIPE, Broken pipe.
0x00007fff81a7ddf2 in select$DARWIN_EXTSN ()
(gdb) bt
#0  0x00007fff81a7ddf2 in select$DARWIN_EXTSN ()
#1  0x00000001000fecd5 in mongo::Listener::initAndListen (this=0x1012022d8) at listen.cpp:253
#2  0x00000001001e2fbd in mongo::PortMessageServer::run (this=0x1012022d0) at message_server_port.cpp:188
#3  0x0000000100489be2 in mongo::start (opts=@0x7fff5fbff268) at server.cpp:165
#4  0x000000010048cb1e in _main () at server.cpp:362
#5  0x000000010048d8da in main (argc=5, argv=0x7fff5fbffb70) at server.cpp:370
...
(gdb) thread 20
[Switching to thread 20 (process 19295)]
0x00007fff80609be5 in std::ostream::sentry::sentry ()
Thread 20 (process 19295):
#0  0x00007fff80609be5 in std::ostream::sentry::sentry ()
#1  0x00007fff8060b457 in std::ostream::_M_insert<long> ()
#2  0x000000010000e77c in mongo::errnoWithDescription (x=32) at log.h:396
#3  0x00000001000f1276 in mongo::Socket::send (this=0x101401e60, data=0x1081f3000 "??!", len=2216385, context=0x10057fd2b "say") at sock.cpp:613
#4  0x00000001000f6d66 in mongo::Message::send (this=0x103d1b140, p=@0x101401e50, context=0x10057fd2b "say") at message.cpp:38
#5  0x00000001000f7239 in mongo::MessagingPort::say (this=0x101401e50, toSend=@0x103d1b140, responseTo=63) at message_port.cpp:271
#6  0x00000001000f73d3 in mongo::MessagingPort::reply (this=0x101401e50, received=@0x103d1bbc0, response=@0x103d1b140, responseTo={x = 63}) at message_port.cpp:221
#7  0x000000010023bd6c in mongo::replyToQuery (queryResultFlags=0, p=0x101401e50, requestMsg=@0x103d1bbc0, data=0x1053ff000, size=2216349, nReturned=17181, startingFrom=101, cursorId=0) at dbmessage.cpp:76
#8  0x0000000100483ff2 in mongo::ShardedClientCursor::sendNextBatch (this=0x10120b560, r=@0x103d1b7d0, ntoreturn=0) at cursors.cpp:117
#9  0x0000000100423aee in mongo::ShardStrategy::getMore (this=0x101106440, r=@0x103d1b7d0) at strategy_shard.cpp:114
#10 0x0000000100476401 in mongo::Request::process (this=0x103d1b7d0, attempt=0) at request.cpp:147
#11 0x000000010048e619 in mongo::ShardedMessageHandler::process (this=0x7fff5fbfef10, m=@0x103d1bbc0, p=0x101401e50, le=0x101207be0) at server.cpp:95
#12 0x00000001001e1147 in mongo::pms::threadRun (inPort=0x101401e50) at message_server_port.cpp:74
#13 0x00000001001e2940 in boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> >::operator()<void (*)(mongo::MessagingPort*), boost::_bi::list0> (this=0x101402980, f=@0x101402978, a=@0x103d1be90, unnamed_arg=0) at bind.hpp:253
#14 0x00000001001e29a2 in boost::_bi::bind_t<void, void (*)(mongo::MessagingPort*), boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> > >::operator() (this=0x101402978) at bind_template.hpp:20
#15 0x00000001001e29cd in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(mongo::MessagingPort*), boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> > > >::run (this=0x101402790) at thread.hpp:61
#16 0x0000000100f44460 in thread_proxy ()
#17 0x00007fff83d9d8bf in _pthread_start ()
#18 0x00007fff83da0b75 in thread_start ()
(gdb) c 
Continuing.
 
Program terminated with signal SIGPIPE, Broken pipe.
The program no longer exists.
(gdb) 

See the full backtrace of thread #20 for details.

Although I'm not sure why signal(SIGPIPE, pipeSigHandler) does not cause the handler to be invoked, a work-around may be to set SO_NOSIGPIPE on the accept()ed socket in Listener::initAndListen(). Socket::connect() already does this:

#ifdef SO_NOSIGPIPE
        // osx
        const int one = 1;
        setsockopt( _fd , SOL_SOCKET, SO_NOSIGPIPE, &one, sizeof(int));
#endif



 Comments   
Comment by auto [ 02/Mar/12 ]

Author:

{u'login': u'milkie', u'name': u'Eric Milkie', u'email': u'milkie@10gen.com'}

Message: SERVER-4949 avoid unhandled SIGPIPE causing process exit, on OS X
Branch: master
https://github.com/mongodb/mongo/commit/5acc61c6377e29ac25dbd105bf026be4eb8ee27a

Comment by Ben Becker [ 16/Feb/12 ]

Attaching the script I used to reproduce. Simply killing this script while running would cause the unhandled SIGPIPE for me at least 50% of the time.

Comment by Eric Milkie [ 16/Feb/12 ]

Ben, can you send me your PHP script? The python script that I wrote isn't able to reproduce the issue on my Mac, even with mongos; I just get the "got pipe signal:" message.

Comment by Eric Milkie [ 16/Feb/12 ]

SIGPIPE appears to be a thread-directed signal (on Darwin). I think that means you have to call signal() on every thread for which you want masking.

Setting the NOSIGPIPE socket option seems like a decent way to fix this; I'll add it to the incoming sockets.

Generated at Thu Feb 08 03:07:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.