[SERVER-14185] Mongo Crashed after StepDown Created: 06/Jun/14  Updated: 10/Jun/14  Resolved: 09/Jun/14

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.6.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Himanshu Matta Assignee: Bruce Lucas (Inactive)
Resolution: Duplicate Votes: 1
Labels: replication
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

OS - Amazon Linux
Mongo Version- 2.6.1
Php Mongo Driver Version - 1.4.1
Running with 1 primary(1 vote), 1 secondary(1 vote) and 1 Arbiter (2 votes)


Issue Links:
Duplicate
duplicates SERVER-14186 rs.stepDown during mapReduce causes f... Closed
Related
Operating System: Linux
Participants:

 Description   

Hi,

When we fire "rs.stepDown()" command then our primary gets crashed instead of converting to Secondary. It shouldn't happen. Please figure it, why it's happening ?

Let me know if you want anything else.



 Comments   
Comment by Bruce Lucas (Inactive) [ 09/Jun/14 ]

We'll track the fix for the issue with mapReduce and stepDown in SERVER-14186.

Comment by J Rassi [ 09/Jun/14 ]

xbhanu: you're encountering SERVER-13500 (an unrelated issue), which has been fixed for 2.6.1. Please upgrade to 2.6.1.

Comment by Bhanu [ 09/Jun/14 ]

Yup. It started happening with me today as well. Pasting the error snippet from logs:-

2014-06-09T07:11:40.856-0400 [conn20] replSet info voting yea for arsenic43.nyc:27017 (4)
2014-06-09T07:11:42.317-0400 [conn16] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.218.31.18:46374]
2014-06-09T07:11:42.327-0400 [rsHealthPoll] replSet member arsenic43.nyc:27017 is now in state PRIMARY
2014-06-09T07:11:42.624-0400 [rsBackgroundSync] replSet syncing to: arsenic43.nyc:27017
2014-06-09T07:11:42.625-0400 [rsBackgroundSync] replset setting syncSourceFeedback to arsenic43.nyc:27017
2014-06-09T07:11:42.627-0400 [rsBackgroundSync] SEVERE: Invalid access at address: 0xa8
2014-06-09T07:11:42.631-0400 [rsBackgroundSync] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11bd301 0x11bc6de 0x11bc7cf 0x32c8a0f710 0xeacaf6 0xeae0cf 0xeae49a 0xdf23cc 0xdf38ca 0xdf4e2d 0x1201c99 0x32c8a079d1 0x32c82e8b6d
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11bd301]
/u/choudhar/mongodb_extract/bin/mongod() [0x11bc6de]
/u/choudhar/mongodb_extract/bin/mongod() [0x11bc7cf]
/lib64/libpthread.so.0() [0x32c8a0f710]
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo18SyncSourceFeedback13replHandshakeEv+0xb86) [0xeacaf6]
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo18SyncSourceFeedback8_connectERKSs+0x2df) [0xeae0cf]
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo18SyncSourceFeedback7connectEPKNS_6MemberE+0x21a) [0xeae49a]
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo7replset14BackgroundSync14getOplogReaderERNS_11OplogReaderE+0x68c) [0xdf23cc]
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo7replset14BackgroundSync7produceEv+0x3a) [0xdf38ca]
/u/choudhar/mongodb_extract/bin/mongod(_ZN5mongo7replset14BackgroundSync14producerThreadEv+0x2d) [0xdf4e2d]
/u/choudhar/mongodb_extract/bin/mongod() [0x1201c99]
/lib64/libpthread.so.0() [0x32c8a079d1]
/lib64/libc.so.6(clone+0x6d) [0x32c82e8b6d]
2014-06-09T07:14:29.699-0400 ***** SERVER RESTARTED *****

Comment by Himanshu Matta [ 06/Jun/14 ]

Yes, Solve this asap.

Comment by Himanshu Matta [ 06/Jun/14 ]

Forget about Drivers version. Let's focus only on mongod process crashed issue which is not related to any driver version.

We found two types of logs which seems related to crashed-
1) mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11c0e91]
mongod(_ZN5mongo10logContextEPKc+0x159) [0x1163109]
mongod(_ZN5mongo13fassertFailedEi+0xcd) [0x114576d]
mongod() [0xe4082a]
mongod(ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_PbbPS3+0xee) [0xe3beae]
mongod(_ZN5mongo2mr5State18prepTempCollectionEv+0x1836) [0x987236]
mongod(_ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2f7) [0x999be7]
mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa1e85a]
mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xd5e) [0xa1f8ce]
mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa21086]
mongod(ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x2307) [0xd4dae7]
mongod() [0xb97322]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x442) [0xb99902]
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x76b6af]
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4fb) [0x117720b]
/lib64/libpthread.so.0(+0x7f18) [0x7fe6b076cf18]
/lib64/libc.so.6(clone+0x6d) [0x7fe6afa82e0d]
2014-06-03T12:33:13.837+0000 [conn1552]

***aborting after fassert() failure

2014-06-03T12:33:13.845+0000 [conn1552] SEVERE: Got signal: 6 (Aborted).
Backtrace:0x11c0e91 0x11c026e 0x7fe6af9d3fc0 0x7fe6af9d3f49 0x7fe6af9d5348 0x11457da 0xe4082a 0xe3beae 0x987236 0x999be7 0xa1e85a 0xa1f8ce 0xa21086 0xd4dae7 0xb97322 0xb99902 0x76b6af 0x117720b 0x7fe6b076cf18 0x7fe6afa82e0d
mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11c0e91]
mongod() [0x11c026e]
/lib64/libc.so.6(+0x33fc0) [0x7fe6af9d3fc0]
/lib64/libc.so.6(gsignal+0x39) [0x7fe6af9d3f49]
/lib64/libc.so.6(abort+0x148) [0x7fe6af9d5348]
mongod(_ZN5mongo13fassertFailedEi+0x13a) [0x11457da]
mongod() [0xe4082a]
mongod(ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_PbbPS3+0xee) [0xe3beae]
mongod(_ZN5mongo2mr5State18prepTempCollectionEv+0x1836) [0x987236]
mongod(_ZN5mongo2mr16MapReduceCommand3runERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x2f7) [0x999be7]
mongod(_ZN5mongo12_execCommandEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x3a) [0xa1e85a]
mongod(_ZN5mongo7Command11execCommandEPS0_RNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xd5e) [0xa1f8ce]
mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x6c6) [0xa21086]
mongod(ZN5mongo11newRunQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x2307) [0xd4dae7]
mongod() [0xb97322]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x442) [0xb99902]
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x76b6af]
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4fb) [0x117720b]
/lib64/libpthread.so.0(+0x7f18) [0x7fe6b076cf18]
/lib64/libc.so.6(clone+0x6d) [0x7fe6afa82e0d]

2)2014-06-03T12:02:54.929+0000 [rsHealthPoll] warning: Failed to connect to 10.0.0.67:10007, reason: errno:111 Connection refused
2014-06-03T12:02:54.929+0000 [rsHealthPoll] replset info 10.0.0.67:10007 heartbeat failed, retrying
2014-06-03T12:02:54.930+0000 [rsHealthPoll] warning: Failed to connect to 10.0.0.67:10007, reason: errno:111 Connection refused
2014-06-03T12:02:54.930+0000 [rsHealthPoll] couldn't connect to 10.0.0.67:10007: couldn't connect to server 10.0.0.67:10007 (10.0.0.67) failed, connection attempt failed
2014-06-03T12:02:54.930+0000 [rsHealthPoll] warning: Failed to connect to 10.0.0.67:10007, reason: errno:111 Connection refused
2014-06-03T12:02:54.930+0000 [rsHealthPoll] couldn't connect to 10.0.0.67:10007: couldn't connect to server 10.0.0.67:10007 (10.0.0.67) failed, connection attempt failed
2014-06-03T12:02:55.019+0000 [conn9785] SEVERE: Invalid access at address: 0x200a0066c2
2014-06-03T12:02:55.194+0000 [conn9785] SEVERE: Got signal: 11 (Segmentation fault).
Backtrace:0x11bd301 0x11bc6de 0x11bc7cf 0x7f57b3ad85b0 0x796892 0x796fbf 0x79773d 0x7979ff 0xc3d34a 0xc48ca6 0xb8f1c9 0xb993e8 0x76b76f 0x117367b 0x7f57b3ad0f18 0x7f57b2de6e0d
mongod(_ZN5mongo15printStackTraceERSo+0x21) [0x11bd301]
mongod() [0x11bc6de]
mongod() [0x11bc7cf]
/lib64/libpthread.so.0(+0xf5b0) [0x7f57b3ad85b0]
mongod(_ZNK5mongo11mutablebson8Document4Impl12writeElementINS_16BSONArrayBuilderEEEvjPT_PKNS_10StringDataE+0x52) [0x796892]
mongod(ZNK5mongo11mutablebson8Document4Impl13writeChildrenINS_16BSONArrayBuilderEEEvjPT+0x4f) [0x796fbf]
mongod(_ZNK5mongo11mutablebson8Document4Impl12writeElementINS_14BSONObjBuilderEEEvjPT_PKNS_10StringDataE+0x22d) [0x79773d]
mongod(ZNK5mongo11mutablebson8Document4Impl13writeChildrenINS_14BSONObjBuilderEEEvjPT+0x4f) [0x7979ff]
mongod(_ZN5mongo6updateERKNS_13UpdateRequestEPNS_7OpDebugEPNS_12UpdateDriverEPNS_14CanonicalQueryE+0x115a) [0xc3d34a]
mongod(_ZN5mongo14UpdateExecutor7executeEv+0x66) [0xc48ca6]
mongod(_ZN5mongo14receivedUpdateERNS_7MessageERNS_5CurOpE+0x729) [0xb8f1c9]
mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0xec8) [0xb993e8]
mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x9f) [0x76b76f]
mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x4fb) [0x117367b]
/lib64/libpthread.so.0(+0x7f18) [0x7f57b3ad0f18]
/lib64/libc.so.6(clone+0x6d) [0x7f57b2de6e0d]

Let me know if these logs are not enough then I'll upload the whole log file.

Comment by Jeremy Mikola [ 06/Jun/14 ]

Please elaborate on the actual errors you are seeing. Saying the "primary gets crashed" implies that the mongod process itself is terminating, which wouldn't have much to do with the driver. If you mean to say that the driver is crashing, please specify if there is a segfault or simply an exception. And if so, either provide the core dump or full stack trace, respectively.

Server logs for the relevant mongod processes would be helpful in the event of a server crash. For diagnosing a driver crash or error, please attempt to collect driver logs via the MongoLog class. The example on MongoLog::setCallback() can be used to collect log info at all levels and from all modules within the driver. You can customize the callback function accordingly to either print the output directly or append it to an output file.

Lastly, I see that you mentioned driver version 1.4.1 in the "Environment" field of this ticket, but the "Affects Version" is 1.5.0. Please confirm which driver version you are using. If you are on 1.4.1, note that it was released approximately one year ago. I would suggest you update to a more recent version and attempt to reproduce your issue with 1.5.3 (the latest stable release). Versions may be found on http://pecl.php.net/mongo

Generated at Thu Feb 08 03:34:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.