[SERVER-6763] Random mongod Segmentation faults after getting Primary in Replica Set (mongo::BSONElement::toString) Created: 14/Aug/12  Updated: 15/Aug/12  Resolved: 14/Aug/12

Status: Closed
Project: Core Server
Component/s: Replication, Stability
Affects Version/s: 2.0.2, 2.0.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nico Kaiser Assignee: Randolph Tan
Resolution: Duplicate Votes: 0
Labels: bson, crash, replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Replica Set with 3 servers, each MongoDB 2.0.7 Linux 64-bit binaries, Debian GNU/Linux 6.0, Kernel 2.6.32-5-amd64, 4 GB RAM


Issue Links:
Duplicate
is duplicated by SERVER-5127 Limit recursion depth of BSONObj::toS... Closed
Operating System: Linux
Participants:

 Description   

Suddenly our MongoDB setup (Replica Set with 3 servers) randomly crashes.

It's always the PRIMARY server that segfaults after some time (3-5 hours), for no obvious reason. Then another server becomes PRIMARY and also crashes after some time.

Stack trace:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000509e1d in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#1  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#2  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#3  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#4  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#5  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#6  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#7  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#8  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#9  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#10 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#11 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#12 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#13 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#14 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#15 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#16 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
...
#2035 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#2036 0x000000000089a439 in mongo::OpDebug::toString() const ()
#2037 0x0000000000890cbd in mongo::LazyStringImpl<mongo::OpDebug>::val() const ()
#2038 0x00000000005077da in mongo::Logstream::operator<<(mongo::LazyString const&) ()
#2039 0x000000000088dfb4 in mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)
    ()
#2040 0x0000000000aa0b38 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
#2041 0x0000000000638767 in mongo::pms::threadRun(mongo::MessagingPort*) ()
#2042 0x00007ffee045c8ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#2043 0x00007ffedfa1792d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#2044 0x0000000000000000 in ?? ()

The last log message before crashing is

Tue Aug 14 14:24:45 [conn16] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: profile.profiles top: { err: "unauthorized" }

but the PRIMARY server gets lots of these messages (and the setup ran well for some months now despite these messages).



 Comments   
Comment by Randolph Tan [ 14/Aug/12 ]

Not in 2.0.7, as it crashes when it tries to log that particular document. If it is possible for you to run version 2.2, then you would be able to see the document successfully being logged (truncated). You probably have to dig through the logs though, since the server will run smoothly.

Comment by Nico Kaiser [ 14/Aug/12 ]

Ok I see. Is there a way to find the request / document that's causing the recursion?

Comment by Nico Kaiser [ 14/Aug/12 ]

Ok, the primary server crashed again:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000509e1d in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#1  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#2  0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#3  0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
...
#2028 0x000000000050a154 in mongo::BSONElement::toString(mongo::StringBuilder&, bool, bool) const ()
#2029 0x000000000050b882 in mongo::BSONObj::toString(mongo::StringBuilder&, bool, bool) const ()
#2030 0x00000000005cbf33 in mongo::LazyStringImpl<mongo::BSONObj>::val() const ()
#2031 0x00000000005077da in mongo::Logstream::operator<<(mongo::LazyString const&) ()
#2032 0x0000000000943a8d in mongo::runQuery(mongo::Message&, mongo::QueryMessage&, mongo::CurOp&, mongo::Message&)
    ()
#2033 0x0000000000888fd7 in ?? ()
#2034 0x000000000088dbb9 in mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)
    ()
#2035 0x0000000000aa0b38 in mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*) ()
#2036 0x0000000000638767 in mongo::pms::threadRun(mongo::MessagingPort*) ()
#2037 0x00007f8e67d648ca in start_thread (arg=<value optimized out>) at pthread_create.c:300
#2038 0x00007f8e6731f92d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#2039 0x0000000000000000 in ?? ()

Latest log entries (logLevel 5):

Tue Aug 14 15:01:58 [conn389] command profile.$cmd command: { findandmodify: "profiles", query: { user: "142370" }, 
sort: {}, new: 1, remove: 0, upsert: 1, fields: { data.members.package.lastUsage.17: 1 }, update: { $set: { data.mem
bers.package.lastUsage.17: "2012-08-14T13:01:58.000Z" } } } ntoreturn:1 reslen:252 7ms
Tue Aug 14 15:01:58 [conn390] runQuery called admin.$cmd { replSetGetStatus: 1 }
Tue Aug 14 15:01:58 [conn390] run command admin.$cmd { replSetGetStatus: 1 }
Tue Aug 14 15:01:58 [conn390] command: { replSetGetStatus: 1 }
Tue Aug 14 15:01:58 [conn390] command admin.$cmd command: { replSetGetStatus: 1 } ntoreturn:1 reslen:63 0ms
Tue Aug 14 15:01:58 [conn542] runQuery called admin.$cmd { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpty
: false, from: "mongo02.xxxxxxxxxxxxx.de:27017" }
Tue Aug 14 15:01:58 [conn542] run command admin.$cmd { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpty: fa
lse, from: "mongo02.xxxxxxxxxxxxx.de:27017" }
Tue Aug 14 15:01:58 [conn542] command: { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpty: false, from: "mo
ngo02.xxxxxxxxxxxxx.de:27017" }
Tue Aug 14 15:01:58 [conn542] command admin.$cmd command: { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpt
y: false, from: "mongo02.xxxxxxxxxxxxx.de:27017" } ntoreturn:1 reslen:128 0ms
Tue Aug 14 15:01:58 [conn543] runQuery called admin.$cmd { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpty
: false, from: "mongo03.xxxxxxxxxxxxx.de:27017" }
Tue Aug 14 15:01:58 [conn543] run command admin.$cmd { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpty: fa
lse, from: "mongo03.xxxxxxxxxxxxx.de:27017" }
Tue Aug 14 15:01:58 [conn543] command: { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpty: false, from: "mo
ngo03.xxxxxxxxxxxxx.de:27017" }
Tue Aug 14 15:01:58 [conn543] command admin.$cmd command: { replSetHeartbeat: "mongocluster", v: 4, pv: 1, checkEmpt
y: false, from: "mongo03.xxxxxxxxxxxxx.de:27017" } ntoreturn:1 reslen:128 0ms
Tue Aug 14 15:01:59 [slaveTracking] update  0ms
Tue Aug 14 15:01:59 [slaveTracking] update  0ms

Comment by Nico Kaiser [ 14/Aug/12 ]

I did not change anything. After the crashes began, I updated the server to 2.0.7 (from 2.0.2) and the (node.js) client.

I'll turn up the logLevel and see if there is anything...

Comment by Scott Hernandez (Inactive) [ 14/Aug/12 ]

Did you change anything in the config before this happened? Can you enable more verbose logging to catch more messages if it happens again? You can do this using the -v startup option or by using the setParameter command if the server is already running: http://docs.mongodb.org/manual/reference/commands/#setParameter

Generated at Thu Feb 08 03:12:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.