[SERVER-16463] From unknown reason, server reported Got signal: 7 (Bus error). Created: 08/Dec/14  Updated: 22/Jan/15  Resolved: 22/Jan/15

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.4.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jakub ?erny Assignee: Ramon Fernandez Marina
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Mon Dec 8 18:00:08.241 [rsHealthPoll] replset info ec2-54-74-234-239.eu-west-1.compute.amazonaws.com:27017 heartbeat failed, retrying
Mon Dec 8 18:00:08.243 [rsHealthPoll] couldn't connect to ec2-54-74-234-239.eu-west-1.compute.amazonaws.com:27017: couldn't connect to server ec2-54-74-234-23
9.eu-west-1.compute.amazonaws.com:27017
Mon Dec 8 18:00:08.245 [rsHealthPoll] couldn't connect to ec2-54-74-234-239.eu-west-1.compute.amazonaws.com:27017: couldn't connect to server ec2-54-74-234-23
9.eu-west-1.compute.amazonaws.com:27017
Mon Dec 8 18:00:08.247 [rsHealthPoll] couldn't connect to ec2-54-74-234-239.eu-west-1.compute.amazonaws.com:27017: couldn't connect to server ec2-54-74-234-23
9.eu-west-1.compute.amazonaws.com:27017
Mon Dec 8 18:00:09.850 Invalid access at address: 0x7f939b8ca008 from thread: conn317766

Mon Dec 8 18:00:09.954 Got signal: 7 (Bus error).

Mon Dec 8 18:00:10.166 Backtrace:
0xde9a71 0x6d0d39 0x6d12c2 0x7fa9e4dda340 0x9cfb09 0xb63480 0xb63bfd 0x6f892f 0xb6c3ae 0x830b29 0x833a78 0x8341c0 0xa89c3a 0xa01644 0xa04196 0x6eba58 0xdd603e
0x7fa9e4dd2182 0x7fa9e40d6fbd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xde9a71]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d39]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x262) [0x6d12c2]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340) [0x7fa9e4dda340]
/usr/bin/mongod(_ZN5mongo13FieldRangeSetD1Ev+0x39) [0x9cfb09]
/usr/bin/mongod(_ZNSt8auto_ptrIN5mongo16MultiPlanScannerEED1Ev+0x120) [0xb63480]
/usr/bin/mongod(_ZN5mongo11MultiCursorD0Ev+0x4d) [0xb63bfd]
/usr/bin/mongod(_ZN5boost6detail12shared_countD1Ev+0x3f) [0x6f892f]
/usr/bin/mongod(_ZN5mongo24QueryOptimizerCursorImplD0Ev+0x8e) [0xb6c3ae]
/usr/bin/mongod(_ZN5mongo12ClientCursorD1Ev+0x2e9) [0x830b29]
/usr/bin/mongod(ZN5mongo12ClientCursor13_erase_inlockEPS0+0x38) [0x833a78]
/usr/bin/mongod(_ZN5mongo12ClientCursor5eraseEx+0x100) [0x8341c0]
/usr/bin/mongod(_ZN5mongo14processGetMoreEPKcixRNS_5CurOpEiRbPb+0x84a) [0xa89c3a]
/usr/bin/mongod(_ZN5mongo15receivedGetMoreERNS_10DbResponseERNS_7MessageERNS_5CurOpE+0x10a4) [0xa01644]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x6f6) [0xa04196]
/usr/bin/mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x98) [0x6eba58]
/usr/bin/mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x42e) [0xdd603e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182) [0x7fa9e4dd2182]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fa9e40d6fbd]



 Comments   
Comment by Ramon Fernandez Marina [ 22/Jan/15 ]

kuba@persoo.cz, we haven't heard back from you for a while so we're closing this ticket. If this is still an issue for you please re-open it and provide the additional information requested above.

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 09/Dec/14 ]

kuba@persoo.cz, can you upload the full logs of the failing server? The system logs from around that time should help confirm/reject my hypothesis: the system did try to re-start mongod, but since the previous instance crashed without removing the mongod.lock file the restart failed a few times, and then upstart gave up (we should see a "respawning too fast" or similar message in the system logs).

Comment by Jakub ?erny [ 09/Dec/14 ]

Aha, you were right.

[ 20.104384] init: plymouth-upstart-bridge main process ended, respawning
[4695321.110865] end_request: I/O error, dev xvda3, sector 1039168
[4695321.110887] Read-error on swap-device (202:3:1039176)
[4695321.110898] Read-error on swap-device (202:3:1039184)
[4695321.110905] Read-error on swap-device (202:3:1039192)
[4695321.110912] Read-error on swap-device (202:3:1039200)
[4695321.110919] Read-error on swap-device (202:3:1039208)
[4695321.110931] Read-error on swap-device (202:3:1039216)
[4695321.110937] Read-error on swap-device (202:3:1039224)
[4695321.110944] Read-error on swap-device (202:3:1039232)
[4695324.599518] init: mongodb main process (1018) terminated with status 14

How is it with upstrart script in *.deb package? Why it does not have respawn? I.e. it do not start again after having fatal error?

Comment by Ramon Fernandez Marina [ 09/Dec/14 ]

kuba@persoo.cz, a "Got signal: 7 (Bus error)... Invalid access at address" message can appear when there is I/O problems or filesystem corruption. Could you please check you dmesg output for I/O errors on your drives? Have you run an fsck on the volume that the dbpath is on?

Generated at Thu Feb 08 03:41:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.