Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-4949

SIGPIPE causing process termination on accept()ed connections (OSX)

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • 2.1.1
    • Affects Version/s: 2.0.2
    • Component/s: Networking
    • None
    • Environment:
      Mac OS X 10.7.3
    • OS X
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      It seems that inbound connections that terminate due to a broken pipe are not invoking the signal handler nor returning EPIPE on send() or recv(). This causes the entire process to die. Socket::connect() sets the SO_NOSIGPIPE socket option, however Listener::initAndListen() does not appear to set this option on the accept()ed socket. The following is reproducible (from gdb mongos) with a simple PHP script that queries for all results and severing the connection while exhausting the cursor:

      Sun Feb 12 21:12:44 [conn1] end connection 127.0.0.1:52843
      
      Program received signal SIGPIPE, Broken pipe.
      0x00007fff81a7ddf2 in select$DARWIN_EXTSN ()
      (gdb) bt
      #0  0x00007fff81a7ddf2 in select$DARWIN_EXTSN ()
      #1  0x00000001000fecd5 in mongo::Listener::initAndListen (this=0x1012022d8) at listen.cpp:253
      #2  0x00000001001e2fbd in mongo::PortMessageServer::run (this=0x1012022d0) at message_server_port.cpp:188
      #3  0x0000000100489be2 in mongo::start (opts=@0x7fff5fbff268) at server.cpp:165
      #4  0x000000010048cb1e in _main () at server.cpp:362
      #5  0x000000010048d8da in main (argc=5, argv=0x7fff5fbffb70) at server.cpp:370
      ...
      (gdb) thread 20
      [Switching to thread 20 (process 19295)]
      0x00007fff80609be5 in std::ostream::sentry::sentry ()
      Thread 20 (process 19295):
      #0  0x00007fff80609be5 in std::ostream::sentry::sentry ()
      #1  0x00007fff8060b457 in std::ostream::_M_insert<long> ()
      #2  0x000000010000e77c in mongo::errnoWithDescription (x=32) at log.h:396
      #3  0x00000001000f1276 in mongo::Socket::send (this=0x101401e60, data=0x1081f3000 "??!", len=2216385, context=0x10057fd2b "say") at sock.cpp:613
      #4  0x00000001000f6d66 in mongo::Message::send (this=0x103d1b140, p=@0x101401e50, context=0x10057fd2b "say") at message.cpp:38
      #5  0x00000001000f7239 in mongo::MessagingPort::say (this=0x101401e50, toSend=@0x103d1b140, responseTo=63) at message_port.cpp:271
      #6  0x00000001000f73d3 in mongo::MessagingPort::reply (this=0x101401e50, received=@0x103d1bbc0, response=@0x103d1b140, responseTo={x = 63}) at message_port.cpp:221
      #7  0x000000010023bd6c in mongo::replyToQuery (queryResultFlags=0, p=0x101401e50, requestMsg=@0x103d1bbc0, data=0x1053ff000, size=2216349, nReturned=17181, startingFrom=101, cursorId=0) at dbmessage.cpp:76
      #8  0x0000000100483ff2 in mongo::ShardedClientCursor::sendNextBatch (this=0x10120b560, r=@0x103d1b7d0, ntoreturn=0) at cursors.cpp:117
      #9  0x0000000100423aee in mongo::ShardStrategy::getMore (this=0x101106440, r=@0x103d1b7d0) at strategy_shard.cpp:114
      #10 0x0000000100476401 in mongo::Request::process (this=0x103d1b7d0, attempt=0) at request.cpp:147
      #11 0x000000010048e619 in mongo::ShardedMessageHandler::process (this=0x7fff5fbfef10, m=@0x103d1bbc0, p=0x101401e50, le=0x101207be0) at server.cpp:95
      #12 0x00000001001e1147 in mongo::pms::threadRun (inPort=0x101401e50) at message_server_port.cpp:74
      #13 0x00000001001e2940 in boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> >::operator()<void (*)(mongo::MessagingPort*), boost::_bi::list0> (this=0x101402980, f=@0x101402978, a=@0x103d1be90, unnamed_arg=0) at bind.hpp:253
      #14 0x00000001001e29a2 in boost::_bi::bind_t<void, void (*)(mongo::MessagingPort*), boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> > >::operator() (this=0x101402978) at bind_template.hpp:20
      #15 0x00000001001e29cd in boost::detail::thread_data<boost::_bi::bind_t<void, void (*)(mongo::MessagingPort*), boost::_bi::list1<boost::_bi::value<mongo::MessagingPort*> > > >::run (this=0x101402790) at thread.hpp:61
      #16 0x0000000100f44460 in thread_proxy ()
      #17 0x00007fff83d9d8bf in _pthread_start ()
      #18 0x00007fff83da0b75 in thread_start ()
      (gdb) c 
      Continuing.
      
      Program terminated with signal SIGPIPE, Broken pipe.
      The program no longer exists.
      (gdb) 
      

      See the full backtrace of thread #20 for details.

      Although I'm not sure why signal(SIGPIPE, pipeSigHandler) does not cause the handler to be invoked, a work-around may be to set SO_NOSIGPIPE on the accept()ed socket in Listener::initAndListen(). Socket::connect() already does this:

      #ifdef SO_NOSIGPIPE
              // osx
              const int one = 1;
              setsockopt( _fd , SOL_SOCKET, SO_NOSIGPIPE, &one, sizeof(int));
      #endif
      

            Assignee:
            milkie@mongodb.com Eric Milkie
            Reporter:
            benjamin.becker Ben Becker (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: