[SERVER-9313] Mongodb segfaults Created: 10/Apr/13  Updated: 10/Dec/14  Resolved: 28/Oct/13

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.2.2, 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Luciano Issoe Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Upgrading ReplicaSet from 2.2.2 to 2.4.1 on Ubuntu 12.04_LTS.

Upgrading one secondary at a time and then stepDown primary and upgrade it.

After upgrading one of the secondaries, the other secondary (2.2.2) became the primary. So far so good. At the same time the one just upgraded to 2.4.1 crashed and asked for a forced restart.

The forced restart made the server come back.

Here´s the trace log:

Wed Apr 10 19:13:42.892 [conn11018514] authenticate db: mobile-ads

{ authenticate: 1, user: "heroku_app", nonce: "c4f142c96e76b192", key: "e3da736a5915cc8fd221c9f0e1055f88" }

Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Invalid access at address: 0x7ed6ab13bff0 from thread: conn11018521
Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Invalid access at address: 0x7ed6b0386ff0 from thread: conn11018523

Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Invalid access at address: 0x7ed6b088bff0 from thread: conn11018532
Wed Apr 10 19:13:42.961
Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Invalid access at address: 0x7ed6ae06aff0 from thread: conn11018525

Invalid access at address: 0x7ed6aec76ff0 from thread: conn11018530
Invalid access at address: 0x7ed6ae16bff0 from thread: conn11018524

Wed Apr 10 19:13:42.961
Invalid access at address: 0x7ed6adf69ff0 from thread: conn11018526
Invalid access at address: 0x7ed6ade68ff0 from thread: conn11018527

Wed Apr 10 19:13:42.961 Invalid access at address: 0x7ed6af57fff0 from thread: conn11018531

Invalid access at address: 0x7ed6ae46eff0 from thread: conn11018520

Invalid access at address: 0x7ed6b0689ff0 from thread: conn11018529

Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Got signal: 11 (Segmentation fault).
Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961
Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.961 Got signal: 11 (Segmentation fault).

Invalid access at address: 0x7ed6aea74ff0 from thread: conn11018522
Invalid access at address: 0x7ed6add67ff0 from thread: conn11018528

Wed Apr 10 19:13:42.961 Got signal: 11 (Segmentation fault).
Got signal: 11 (Segmentation fault).

Wed Apr 10 19:13:42.961 Got signal: 11 (Segmentation fault).
Wed Apr 10 19:13:42.961 Wed Apr 10 19:13:42.962 Got signal: 11 (Segmentation fault).
Got signal: 11 (Segmentation fault).

Got signal: 11 (Segmentation fault).

Got signal: 11 (Segmentation fault).

Wed Apr 10 19:13:42.961
Got signal: 11 (Segmentation fault).
Invalid access at address: 0x7ed6ae56fff0 from thread: conn11018519
Got signal: 11 (Segmentation fault).
Wed Apr 10 19:13:42.962 Got signal: 11 (Segmentation fault).


Operating System: ALL
Participants:

 Comments   
Comment by Stennie Steneker (Inactive) [ 28/Oct/13 ]

Hi Luciano,

I'm closing this issue due to inactivity. As Eric noted, this may be related to SERVER-9014 which was resolved in the MongoDB 2.4.2 production release.

If you've upgrade to 2.4.2 or newer and are still experiencing this issue please feel free to reopen with additional details.

Thanks,
Stephen

Comment by Eric Milkie [ 11/Apr/13 ]

We fixed a segfault that might be your issue, in 2.4.2: SERVER-9014
We just released the 2.4.2 release candidate today, so if you want to try the new version, it may solve your issue.
The 2.4.2 release candidate is projected to become a released 2.4.2 in about a week or two.

Comment by Luciano Issoe [ 11/Apr/13 ]

It happened again. Third time in two days! Three different instances, same error.

Rolling back to 2.2;...

Comment by Luciano Issoe [ 11/Apr/13 ]

It happened again, Primary crashed right now.

Thu Apr 11 15:10:12.016 Invalid access at address: 0x7dd936ac1ff0 from thread: conn24344468
Thu Apr 11 15:10:12.031 [conn24344470] authenticate db: web-ads

{ authenticate: 1, user: "heroku_app", nonce: "83c7ab9ad2b844e6", key: "d25db168be848d3ab366378af3dd40b6" }

Thu Apr 11 15:10:12.036 [conn24344471] authenticate db: mobile-ads

{ authenticate: 1, user: "heroku_app", nonce: "cdf74ad518fe62a6", key: "4ad17b691330d66715cf7fc22baacb70" }

Thu Apr 11 15:10:12.047 [conn24344472] authenticate db: web-ads

{ authenticate: 1, user: "heroku_app", nonce: "d813c9227661b1ad", key: "f53f4b2582892832459e0017c320ec19" }

Thu Apr 11 15:10:12.068 [conn24344473] authenticate db: web-ads

{ authenticate: 1, user: "heroku_app", nonce: "321849d76d43b3a5", key: "df43e4139fb8af8cef140d430f0e72b1" }

Thu Apr 11 15:10:12.069 [conn24344474] authenticate db: web-ads

{ authenticate: 1, user: "heroku_app", nonce: "eb14c1794b943d2b", key: "ba2ed3fe327d2c80dcee0d8e399c71d2" }

Thu Apr 11 15:10:12.078 [conn24344476] authenticate db: mobile-ads

{ authenticate: 1, user: "heroku_app", nonce: "ade8bfe21ac6eac9", key: "e3b4215ce260a54ee45bf224db9020b0" }

Thu Apr 11 15:10:12.078 [conn24344475] authenticate db: mobile-ads

{ authenticate: 1, user: "heroku_app", nonce: "e239c13063ea6bbb", key: "ac6510baf61def9fc17c9c5fa5b57783" }

Thu Apr 11 15:10:12.086 [conn24344477] authenticate db: mobile-ads

{ authenticate: 1, user: "heroku_app", nonce: "d4471c9b812dbb57", key: "dbf832eda0911b5f7634c9adbcf29d6f" }

Thu Apr 11 15:10:12.087 [conn24344478] authenticate db: mobile-ads

{ authenticate: 1, user: "heroku_app", nonce: "7ac86afe69ed14c1", key: "c55bb0c8f30b3dcfa280abdbb92c7d0a" }

Thu Apr 11 15:10:12.094 Got signal: 11 (Segmentation fault).

Here´s whats logged on dmesg:

[13252333.794369] mongod D ffff880f22dd3700 0 914 1 0x00000000
[13252333.794373] ffff880ec0107cb8 0000000000000282 0000000000000000 ffffffffffffffe0
[13252333.794377] ffff880ec0107fd8 ffff880ec0107fd8 ffff880ec0107fd8 0000000000013700
[13252333.794381] ffff880ebd5b0000 ffff880ec0ad5b80 00007f43935c09e0 ffff880ebe5ea680
[13252333.794385] Call Trace:
[13252333.794388] [<ffffffff8165491f>] schedule+0x3f/0x60
[13252333.794391] [<ffffffff8106af95>] exit_mm+0x85/0x130
[13252333.794394] [<ffffffff8106b1ae>] do_exit+0x16e/0x450
[13252333.794397] [<ffffffff8100a56d>] ? xen_force_evtchn_callback+0xd/0x10
[13252333.794400] [<ffffffff8106b634>] do_group_exit+0x44/0xa0
[13252333.794403] [<ffffffff8107c45c>] get_signal_to_deliver+0x21c/0x420
[13252333.794406] [<ffffffff81014825>] do_signal+0x45/0x130
[13252333.794409] [<ffffffff8108ecf8>] ? hrtimer_nanosleep+0xb8/0x180
[13252333.794412] [<ffffffff8108d8c0>] ? update_rmtp+0x70/0x70
[13252333.794415] [<ffffffff81014ad5>] do_notify_resume+0x65/0x80
[13252333.794418] [<ffffffff8165f050>] int_signal+0x12/0x17
[13252333.794421] INFO: task mongod:915 blocked for more than 120 seconds.
[13252333.794426] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[13252333.794432] mongod D ffff880f22df3700 0 915 1 0x00000000
[13252333.794435] ffff880ec0109cb8 0000000000000282 0000000000000000 ffffffffffffffe0
[13252333.794439] ffff880ec0109fd8 ffff880ec0109fd8 ffff880ec0109fd8 0000000000013700
[13252333.794443] ffff880ebd5b16e0 ffff880ec0ad16e0 00007f4392dbf9e0 ffff880ebe5ea680
[13252333.794447] Call Trace:
[13252333.794450] [<ffffffff8165491f>] schedule+0x3f/0x60
[13252333.794453] [<ffffffff8106af95>] exit_mm+0x85/0x130
[13252333.794456] [<ffffffff8106b1ae>] do_exit+0x16e/0x450
[13252333.794460] [<ffffffff810d73a7>] ? irq_to_desc+0x17/0x20
[13252333.794463] [<ffffffff8100a56d>] ? xen_force_evtchn_callback+0xd/0x10
[13252333.794466] [<ffffffff8106b634>] do_group_exit+0x44/0xa0
[13252333.794469] [<ffffffff8107c45c>] get_signal_to_deliver+0x21c/0x420
[13252333.794472] [<ffffffff81014825>] do_signal+0x45/0x130
[13252333.794476] [<ffffffff8108ecf8>] ? hrtimer_nanosleep+0xb8/0x180
[13252333.794479] [<ffffffff8108d8c0>] ? update_rmtp+0x70/0x70
[13252333.794482] [<ffffffff81014ad5>] do_notify_resume+0x65/0x80
[13252333.794484] [<ffffffff8165f050>] int_signal+0x12/0x17

Generated at Thu Feb 08 03:20:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.