Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-6763

Random mongod Segmentation faults after getting Primary in Replica Set (mongo::BSONElement::toString)

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.0.2, 2.0.7
    • Component/s: Replication, Stability
    • Environment:
      Replica Set with 3 servers, each MongoDB 2.0.7 Linux 64-bit binaries, Debian GNU/Linux 6.0, Kernel 2.6.32-5-amd64, 4 GB RAM
    • Linux

      Suddenly our MongoDB setup (Replica Set with 3 servers) randomly crashes.

      It's always the PRIMARY server that segfaults after some time (3-5 hours), for no obvious reason. Then another server becomes PRIMARY and also crashes after some time.

      Stack trace:

      Program terminated with signal 11, Segmentation fault.
      #0  0x0000000000509e1d in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #1  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #2  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #3  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #4  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #5  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #6  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #7  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #8  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #9  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #10 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #11 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #12 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #13 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #14 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      #15 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #16 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
      ...
      #2035 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
      #2036 0x000000000089a439 in mongo::OpDebug::toString() const ()
      #2037 0x0000000000890cbd in mongo::LazyStringImpl<mongo::OpDebug>::val() const ()
      #2038 0x00000000005077da in mongo::Logstream::operator<<(mongo::LazyString const&) ()
      #2039 0x000000000088dfb4 in mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)
          ()
      #2040 0x0000000000aa0b38 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
      #2041 0x0000000000638767 in mongo::pms::threadRun(mongo::MessagingPort*) ()
      #2042 0x00007ffee045c8ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
      #2043 0x00007ffedfa1792d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
      #2044 0x0000000000000000 in ?? ()
      

      The last log message before crashing is

      Tue Aug 14 14:24:45 [conn16] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: profile.profiles top: { err: "unauthorized" }
      

      but the PRIMARY server gets lots of these messages (and the setup ran well for some months now despite these messages).

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            nicokaiser Nico Kaiser
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: