[SERVER-3212] Got signal: 11 (Segmentation fault) Created: 07/Jun/11  Updated: 12/Jul/16  Resolved: 27/Jul/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.8.1
Fix Version/s: 1.8.3

Type: Bug Priority: Blocker - P1
Reporter: Paul Gao Assignee: Mathias Stearn
Resolution: Done Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Centos 5 64bit


Attachments: File numa_maps     File numa_maps_mongodb     File numa_maps_mongodb     File numa_maps_mongodb    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-3196 Segmentation Fault at startup on NUMA... Closed
Related
related to SERVER-3496 file/io warning on some numa system W... Closed
Operating System: Linux
Participants:

 Description   

only 1.8.2, 1.8.1 fine.

Tue Jun 7 16:03:32 [initandlisten] MongoDB starting : pid=4609 port=9601 dbpath=/home/mongodb 64-bit
Tue Jun 7 16:03:32 Invalid access at address: 0x1

Tue Jun 7 16:03:32 Got signal: 11 (Segmentation fault).

Tue Jun 7 16:03:32 Backtrace:
0x8a7a29 0x8a8000 0x37a480eb10 0x37a3c79b60 0x56d6f3 0x8ac592 0x8ad478 0x8b30af 0x37a3c1d994 0x4e10c9
/usr/local/mongodb/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a7a29]
/usr/local/mongodb/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8000]
/lib64/libpthread.so.0 [0x37a480eb10]
/lib64/libc.so.6(strlen+0x10) [0x37a3c79b60]
/usr/local/mongodb/bin/mongod(_ZN5mongo13show_warningsEv+0x363) [0x56d6f3]
/usr/local/mongodb/bin/mongod(_ZN5mongo14_initAndListenEiPKc+0xf2) [0x8ac592]
/usr/local/mongodb/bin/mongod(_ZN5mongo13initAndListenEiPKc+0x18) [0x8ad478]
/usr/local/mongodb/bin/mongod(main+0x5acf) [0x8b30af]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x37a3c1d994]
/usr/local/mongodb/bin/mongod(__gxx_personality_v0+0x3a1) [0x4e10c9]

Tue Jun 7 16:03:32 ERROR: Client::shutdown not called: initandlisten



 Comments   
Comment by Eliot Horowitz (Inactive) [ 27/Jul/11 ]

added a new case for the file io issue

Comment by Mathias Stearn [ 06/Jul/11 ]

@Jalmari: I think that is an unrelated issue. Could you open up another ticket?

Comment by Jalmari Raippalinna [ 27/Jun/11 ]

After upgrading to 1.8.2 and enabling chunk balancer again, it started first just fine and then segfaulted with:

Mon Jun 27 10:32:54 Invalid access at address: 0

Mon Jun 27 10:32:54 Got signal: 11 (Segmentation fault).
Mon Jun 27 10:32:54 Backtrace:
0x8a8039 0x8a8610 0x7f5fc8d398f0 0x7f5fc8abe787 0x7f5fc8abf04c 0x6d7863 0x6d10f2 0x7dd347 0x7de8f1 0x647e45 0x64b3de 0x7547a5 0x759ec8 0x8a8fce 0x8bb630 0x7
f5fc8d309ca 0x7f5fc82df70d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a8039]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8610]
/lib/libpthread.so.0(+0xf8f0) [0x7f5fc8d398f0]
/usr/lib/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x47) [0x7f5fc8abe787]
/usr/lib/libstdc++.so.6(_ZNSsC1ERKSs+0x3c) [0x7f5fc8abf04c]
/usr/bin/mongod(_ZNK5mongo11ReplSetImpl16_summarizeStatusERNS_14BSONObjBuilderE+0x753) [0x6d7863]
/usr/bin/mongod(_ZN5mongo19CmdReplSetGetStatus3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x122) [0x6d10f2]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x227) [0x7dd347]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7de8f1]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x647e45]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x324e) [0x64b3de]
/usr/bin/mongod() [0x7547a5]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b8) [0x759ec8]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib/libpthread.so.0(+0x69ca) [0x7f5fc8d309ca]
/lib/libc.so.6(clone+0x6d) [0x7f5fc82df70d]
Mon Jun 27 10:32:54 Invalid access at address: 0x4

Mon Jun 27 10:32:54 Got signal: 11 (Segmentation fault).
Mon Jun 27 10:32:54 Backtrace:
0x8a8039 0x8a8610 0x7f5fc8d398f0 0x7128f5 0x711052 0x74cdb1 0x876ed7 0x7dd347 0x7de8f1 0x647e45 0x64b3de 0x7547a5 0x759ec8 0x8a8fce 0x8bb630 0x7f5fc8d309ca
0x7f5fc82df70d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a8039]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8610]
/lib/libpthread.so.0(+0xf8f0) [0x7f5fc8d398f0]
/usr/bin/mongod() [0x7128f5]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x711052]
/usr/bin/mongod(_ZN5mongo7Helpers11removeRangeERKSsRKNS_7BSONObjES5_bbPNS0_14RemoveCallbackE+0x791) [0x74cdb1]
/usr/bin/mongod(_ZN5mongo16MoveChunkCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x8047) [0x876ed7]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x227) [0x7dd347]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7de8f1]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x647e45]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x324e) [0x64b3de]
/usr/bin/mongod() [0x7547a5]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b8) [0x759ec8]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib/libpthread.so.0(+0x69ca) [0x7f5fc8d309ca] /lib/libc.so.6(clone+0x6d) [0x7f5fc82df70d]

EDIT:
This happened right after we started another slave while moveChunk was doing the move. Our second attempt didn't cause seg fault.

Comment by Grégoire Seux [ 21/Jun/11 ]

The File IO error appears right AFTER the numa_maps warning. It's still appears in 1.8.2-final

Comment by Mathias Stearn [ 20/Jun/11 ]

@Paul Gao: Just want to verify that the File I/O error comes after the numa_maps warning. If so I think it indicates a different problem...

@everyone else: Is there a File I/O error before the numa_maps warning and are you still getting that warning on 1.8.2-final?

Comment by Grégoire Seux [ 20/Jun/11 ]

my numa maps is attached.

uname -a
Linux SYS-L121-9-1 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment by Swen Thümmler [ 18/Jun/11 ]

output of cat proc/[PID_OF_MONGOD]/numa_maps

uname -a:
Linux lnxp-3741 2.6.18-194.26.1.el5 #1 SMP Fri Oct 29 14:21:16 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Comment by Paul Gao [ 18/Jun/11 ]

1.8.2 final version show:
"

    • WARNING: cannot parse numa_maps

Sat Jun 18 11:32:40 [initandlisten] File I/O errno:29 Illegal seek
"

normal?

Comment by Mathias Stearn [ 16/Jun/11 ]

Hmm, neither of those should be failing. Could you fire up mongod from master and if you see the warning about numa_maps attach /proc/[PID_OF_MONGOD]/numa_maps? There could be something odd there that isn't showing up with cat. Also please include the output of "uname -a" so I can see what kernel you are running.

Comment by Swen Thümmler [ 16/Jun/11 ]

output of numactl --interleave=all cat /proc/self/numa_maps:

00400000 interleave=0-1 file=/bin/cat mapped=3 N0=3
00604000 interleave=0-1 file=/bin/cat anon=1 dirty=1 mapped=2 N0=2
0d455000 interleave=0-1 heap anon=3 dirty=3 active=0 N0=2 N1=1
33eda00000 interleave=0-1 file=/lib64/ld-2.5.so mapped=23 mapmax=34 N1=23
33edc1b000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
33edc1c000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
33ede00000 interleave=0-1 file=/lib64/libc-2.5.so mapped=65 mapmax=40 N1=65
33edf4e000 interleave=0-1 file=/lib64/libc-2.5.so
33ee14e000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=38 N1=3
33ee152000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
33ee153000 interleave=0-1 anon=4 dirty=4 active=0 N0=2 N1=2
2ac4ca9c4000 interleave=0-1 anon=1 dirty=1 N0=1
2ac4ca9d4000 interleave=0-1 anon=2 dirty=2 N0=1 N1=1
2ac4ca9d6000 interleave=0-1 file=/usr/lib/locale/locale-archive mapped=10 mapmax=33 N0=10
7fff11704000 interleave=0-1 stack anon=3 dirty=3 N0=2 N1=1

Comment by Eliot Horowitz (Inactive) [ 15/Jun/11 ]

the segfault is fixed in 1.8.2 - but the warning will be put back for 1.8.3

Comment by Grégoire Seux [ 15/Jun/11 ]

Here it is :

00400000 default file=/bin/cat mapped=3 N0=3
00604000 default file=/bin/cat anon=1 dirty=1 mapped=2 N0=2
0c11e000 default heap anon=3 dirty=3 active=0 N0=3
3d5a200000 default file=/lib64/ld-2.5.so mapped=24 mapmax=45 N0=24
3d5a41b000 default file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
3d5a41c000 default file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
3d5a600000 default file=/lib64/libc-2.5.so mapped=60 mapmax=49 N0=60
3d5a74e000 default file=/lib64/libc-2.5.so
3d5a94d000 default file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=22 N0=3
3d5a951000 default file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
3d5a952000 default anon=4 dirty=4 active=0 N0=4
2b5ea4273000 default anon=1 dirty=1 N0=1
2b5ea427e000 default anon=2 dirty=2 N0=2
2b5ea4280000 default file=/usr/lib/locale/locale-archive mapped=11 mapmax=6 N0=11
7fffd8179000 default stack anon=3 dirty=3 N0=3

Comment by Mathias Stearn [ 14/Jun/11 ]

Yes, the fix now displays that message rather than crashing. Could you attach /proc/self/numa_maps from that box? I'd like to know what format it is using that we can't parse correctly.

Comment by Grégoire Seux [ 10/Jun/11 ]

Thanks, this works (mongod starts) !

however there is still an error displayed in addition to the warning.

    • WARNING: cannot parse numa_maps

Fri Jun 10 11:19:59 [initandlisten] File I/O errno:29 Illegal seek
db version v1.8.3-pre-, pdfile version 4.5
Fri Jun 10 11:19:59 [initandlisten] git version: 9990775e39701870c5238388aecfabda992de8e3
Fri Jun 10 11:19:59 [initandlisten] build sys info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41

is this related ?

Comment by Erez Zarum [ 09/Jun/11 ]

Hey Eliot, a little bit up in the comments you can find my numa_maps.

Comment by Grégoire Seux [ 09/Jun/11 ]

Here is mine.

Comment by Eliot Horowitz (Inactive) [ 09/Jun/11 ]

CAn you attach numa_maps?

Comment by Erez Zarum [ 09/Jun/11 ]

Just compiled, seems to work.
throws the can't parse numa maps warning and simply continue to load.

Comment by Grégoire Seux [ 09/Jun/11 ]

Thanks, I'll try tomorrow.

Comment by auto [ 09/Jun/11 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: fix numa issue SERVER-3212
Branch: v1.8
https://github.com/mongodb/mongo/commit/9990775e39701870c5238388aecfabda992de8e3

Comment by auto [ 09/Jun/11 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: fix numa issue SERVER-3212
Branch: master
https://github.com/mongodb/mongo/commit/50b5572b0859da1a6d6f266397dc20f4e5af55f9

Comment by Erez Zarum [ 09/Jun/11 ]

I have commented out the check numa enabled kernel code (with the warn message), recompiled, now mongod runs well.
the stack trace of the crash points to startsWith in version.cpp (which goes to util/goodies.cpp), it's seems as it does not handle the sanity check that well and prevents mongod to initialize well.
How important is that sanity check for numa enabled kernel?

Comment by Grégoire Seux [ 09/Jun/11 ]

I confirm that the latest build does not starts properly. As Erez, numactl show that interleaves lines start with 0-1 on our machine

Comment by Erez Zarum [ 09/Jun/11 ]

I can reproduce it on several systems.
I think it comes from not parsing /proc/self/numa_maps.
$ numactl --interleave=all cat /proc/self/numa_maps
00400000 interleave=0-1 file=/bin/cat mapped=3 N0=3
00604000 interleave=0-1 file=/bin/cat anon=1 dirty=1 mapped=2 N0=2
0c3be000 interleave=0-1 heap anon=3 dirty=3 active=0 N0=1 N1=2
3d41a00000 interleave=0-1 file=/lib64/ld-2.5.so mapped=24 mapmax=43 N0=24
3d41c1b000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
3d41c1c000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
3d41e00000 interleave=0-1 file=/lib64/libc-2.5.so mapped=60 mapmax=48 N0=60
3d41f4e000 interleave=0-1 file=/lib64/libc-2.5.so
3d4214d000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=30 N0=3
3d42151000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 N1=1
3d42152000 interleave=0-1 anon=4 dirty=4 active=0 N0=2 N1=2
2ac82fa3f000 interleave=0-1 anon=1 dirty=1 N1=1
2ac82fa4b000 interleave=0-1 anon=2 dirty=2 N0=1 N1=1
2ac82fa4d000 interleave=0-1 file=/usr/lib/locale/locale-archive mapped=12 mapmax=15 N0=12
7fff54fd8000 interleave=0-1 stack anon=2 dirty=2 N0=1 N1=1
This output is from a server that won't load and crash immediately.

$ numactl --interleave=all cat /proc/self/numa_maps
00400000 interleave=0 file=/bin/cat mapped=3 active=0 N0=3
00604000 interleave=0 file=/bin/cat anon=1 dirty=1 mapped=2 active=1 N0=2
09491000 interleave=0 heap anon=3 dirty=3 active=0 N0=3
35dba00000 interleave=0 file=/lib64/ld-2.5.so mapped=24 mapmax=19 N0=24
35dbc1b000 interleave=0 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
35dbc1c000 interleave=0 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
35dbe00000 interleave=0 file=/lib64/libc-2.5.so mapped=60 mapmax=21 N0=60
35dbf4e000 interleave=0 file=/lib64/libc-2.5.so
35dc14d000 interleave=0 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=15 N0=3
35dc151000 interleave=0 file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
35dc152000 interleave=0 anon=4 dirty=4 active=0 N0=4
2b3021fa1000 interleave=0 anon=1 dirty=1 N0=1
2b3021faa000 interleave=0 anon=2 dirty=2 N0=2
2b3021fac000 interleave=0 file=/usr/lib/locale/locale-archive mapped=11 mapmax=10 N0=11
7fff2b126000 interleave=0 stack anon=3 dirty=3 N0=3
This is a working one.

interleave=0 works
interleave=0-1 does not.
parsing error?

i use the latest code.

Comment by Eliot Horowitz (Inactive) [ 09/Jun/11 ]

Its in the 1.8 nightly:

http://downloads.mongodb.org/linux/mongodb-linux-x86_64-v1.8-latest.tgz

Comment by Grégoire Seux [ 09/Jun/11 ]

Thanks for your reply. However it seems the build did not succeed (http://buildbot.mongodb.org/builders/Nightly%20Linux%2064-bit%20v8/builds/565). I'll try again tomorrow and edit this message.

The stack from the current (yesterday) build is :

Thu Jun 9 11:51:02 [initandlisten] MongoDB starting : pid=4615 port=27021 dbpath=/var/db/shd1 64-bit
Thu Jun 9 11:51:02 Invalid access at address: 0x1

Thu Jun 9 11:51:02 Got signal: 11 (Segmentation fault).

Thu Jun 9 11:51:02 Backtrace:
0x8a7a29 0x8a8000 0x3d5ae0eb10 0x3d5a679a10 0x56d6f3 0x8ac592 0x8ad478 0x8b30af 0x3d5a61d994 0x4e10c9
./mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a7a29]
./mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8000]
/lib64/libpthread.so.0 [0x3d5ae0eb10]
/lib64/libc.so.6(strlen+0x10) [0x3d5a679a10]
./mongod(_ZN5mongo13show_warningsEv+0x363) [0x56d6f3]
./mongod(_ZN5mongo14_initAndListenEiPKc+0xf2) [0x8ac592]
./mongod(_ZN5mongo13initAndListenEiPKc+0x18) [0x8ad478]
./mongod(main+0x5acf) [0x8b30af]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x3d5a61d994]
./mongod(__gxx_personality_v0+0x3a1) [0x4e10c9]

Thu Jun 9 11:51:02 dbexit:
Thu Jun 9 11:51:02 [initandlisten] File I/O errno:29 Illegal seek
shutdown: going to close listening sockets...
Thu Jun 9 11:51:02 [initandlisten] shutdown: going to flush diaglog...
Thu Jun 9 11:51:02 [initandlisten] shutdown: going to close sockets...
Thu Jun 9 11:51:02 [initandlisten] shutdown: waiting for fs preallocator...
Thu Jun 9 11:51:02 [initandlisten] shutdown: closing all files...
Thu Jun 9 11:51:02 closeAllFiles() finished
Thu Jun 9 11:51:02 dbexit: really exiting now

Comment by Eliot Horowitz (Inactive) [ 09/Jun/11 ]

Can you try the 1.8 nightly tomorrow?

Comment by auto [ 09/Jun/11 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: try to fix numa segfault SERVER-3212
Branch: v1.8
https://github.com/mongodb/mongo/commit/959500ba9977891b6a6d4736615d1ae5bf22d2c2

Comment by auto [ 09/Jun/11 ]

Author:

{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}

Message: try to fix numa segfault SERVER-3212
Branch: master
https://github.com/mongodb/mongo/commit/c5b9318b6d8100fd264fff49aae4a0e333507b17

Comment by Eliot Horowitz (Inactive) [ 09/Jun/11 ]

Can you send a stack from 1.8.3-rc3

Comment by Grégoire Seux [ 08/Jun/11 ]

same issue with 1.8.2-rc3 (CentOs 5 64bits)

edit : confirmed that it is specific to 1.8.2-rc3 (1.8.1 works).

Comment by Paul Gao [ 08/Jun/11 ]

I see github history, guess url -> http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-1.8.2.tgz

Comment by Eliot Horowitz (Inactive) [ 08/Jun/11 ]

What version is this with?
1.8.2 isn't out yet - so can't be 1.8.2

Generated at Thu Feb 08 03:02:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.