Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8105

SIGSEGV in db/clientcursor.cpp

    • Type: Icon: Bug Bug
    • Resolution: Incomplete
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 2.0.7
    • Component/s: Stability
    • Labels:
      None
    • Environment:
      Solaris 10u9 amd64
    • Solaris
    • Hide

      Cannot reproduce in the lab

      Show
      Cannot reproduce in the lab

      I have an issue with a Mongo 2.0.7 installation. Currently this DB is used only to cache data for communication between unrelated processes. The data is not sharded or replicated, and there is one instance running on each computer. There is a seperate replicated sharded set on the same set of computers, but we cannot replicate the issue in the shardeded set, only in the standalone instance.

      From time to time (usually about once every 2 months or so), we are detecting that the Mongo local instance has gone down, interrupting IPC for our application suite, and also losing some client notifications. Unfortunately we have not yet been able to identify a series of environmental or data conditions that can trigger this. Unfortunately we also cannot get a SIGSEGV to generate a stack trace we can look at on Solaris. Instead, we have re-compiled the 2.0.7 source and added our own stack trace routines into the code to try and determine the exact point of the failure. I have attached all the stack traces we have to this ticket.

      The stack traces are along these lines:

      -----------------  lwp# 542 / thread# 542  --------------------
       fffffd7ffe56173a read     (39, 27b153a, 1e)
       0000000000c43bd4 _ZN4redi16basic_pstreambufIcSt11char_traitsIcEE11fill_bufferEb () + 164
       0000000000c449e7 _ZN4redi16basic_pstreambufIcSt11char_traitsIcEE9underflowEv () + 27
       0000000000c3d219 _ZNSs12_S_constructISt19istreambuf_iteratorIcSt11char_traitsIcEEEEPcT_S5_RKSaIcESt18input_iterator_tag () + 559
       0000000000e5f570 _ZN5mongo11sunosPstackERSob () + 480
       00000000008cb880 _ZN5mongo10abruptQuitEi () + 3e0
       00000000008cbdf0 _ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv () + 240
       fffffd7ffe55c2e6 __sighndlr () + 6
       fffffd7ffe550bc2 call_user_handler () + 252
       fffffd7ffe550dee sigacthandler (b, fffffd7ff9a07d78, fffffd7ff9a07a10) + ee
       --- called from signal handler with signal 11 (SIGSEGV) ---
       000000000082dec0 _ZN5mongo6Record5touchEb ()
       000000000079233b _ZN5mongo12ClientCursor5yieldEiPNS_6RecordE () + 6b
       000000000079241c _ZN5mongo12ClientCursor14yieldSometimesENS0_11RecordNeedsEPb () + 9c
       000000000086639f _ZN5mongo14_updateObjectsEbPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEPNS_11RemoveSaverEb () + 5df
       00000000008695a6 _ZN5mongo13updateObjectsEPKcRKNS_7BSONObjES2_bbbRNS_7OpDebugEb () + 116
       0000000000810af9 _ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE () + 359
       0000000000811d6b _ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE () + ecb
       0000000000e5cbec _ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE () + ec
       0000000000707b49 _ZN5mongo3pms9threadRunEPNS_13MessagingPortE () + 269
       fffffd7ffe851506 thread_proxy () + 66
       fffffd7ffe55bfbb _thr_setup () + 5b
       fffffd7ffe55c1e0 _lwp_start ()
      
      

      From what we can determine, the issue is on line 443 in db/clientcursor.cpp, where an invalid pointer (not NULL) is used:

      db/clientcursor.cpp-433-        }
      db/clientcursor.cpp-434-        else {
      db/clientcursor.cpp-435-            warning() << "don't understand RecordNeeds: " << (int)need << endl;
      db/clientcursor.cpp-436-            return 0;
      db/clientcursor.cpp-437-        }
      db/clientcursor.cpp-438-
      db/clientcursor.cpp-439-        DiskLoc l = currLoc();
      db/clientcursor.cpp-440-        if ( l.isNull() )
      db/clientcursor.cpp-441-            return 0;
      db/clientcursor.cpp-442-        
      db/clientcursor.cpp:443:        Record * rec = l.rec();
      db/clientcursor.cpp-444-        if ( rec->likelyInPhysicalMemory() ) 
      db/clientcursor.cpp-445-            return 0;
      db/clientcursor.cpp-446-        
      db/clientcursor.cpp-447-        return rec;
      db/clientcursor.cpp-448-    }
      db/clientcursor.cpp-449-
      db/clientcursor.cpp-450-    bool ClientCursor::yieldSometimes( RecordNeeds need, bool *yielded ) {
      db/clientcursor.cpp-451-        if ( yielded ) {
      db/clientcursor.cpp-452-            *yielded = false;   
      db/clientcursor.cpp-453-        }
      
      

      This then subsequently causes a SIGSEGV in db/clientcursor.cpp line 512:

      db/clientcursor.cpp-502-                CurOp * c = cc().curop();
      db/clientcursor.cpp-503-                while ( c->parent() )
      db/clientcursor.cpp-504-                    c = c->parent();
      db/clientcursor.cpp-505-                warning() << "ClientCursor::yield can't unlock b/c of recursive lock"
      db/clientcursor.cpp-506-                          << " ns: " << ns 
      db/clientcursor.cpp-507-                          << " top: " << c->info()
      db/clientcursor.cpp-508-                          << endl;
      db/clientcursor.cpp-509-            }
      db/clientcursor.cpp-510-
      db/clientcursor.cpp-511-            if ( rec )
      db/clientcursor.cpp:512:                rec->touch();
      db/clientcursor.cpp-513-
      db/clientcursor.cpp-514-            lk.reset(0); // need to release this before dbtempreleasecond
      db/clientcursor.cpp-515-        }
      db/clientcursor.cpp-516-    }
      db/clientcursor.cpp-517-
      db/clientcursor.cpp-518-    bool ClientCursor::prepareToYield( YieldData &data ) {
      db/clientcursor.cpp-519-        if ( ! _c->supportYields() )
      db/clientcursor.cpp-520-            return false;
      db/clientcursor.cpp-521-        if ( ! _c->prepareToYield() ) {
      db/clientcursor.cpp-522-            return false;   
      
      

      Unfortunately we cannot determine why the pointer becomes invalid, and what conditions are needed for this.

            Assignee:
            james.wahlin@mongodb.com James Wahlin
            Reporter:
            valerion Braam van Heerden
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: