|
added a new case for the file io issue
|
|
@Jalmari: I think that is an unrelated issue. Could you open up another ticket?
|
|
After upgrading to 1.8.2 and enabling chunk balancer again, it started first just fine and then segfaulted with:
Mon Jun 27 10:32:54 Invalid access at address: 0
Mon Jun 27 10:32:54 Got signal: 11 (Segmentation fault).
Mon Jun 27 10:32:54 Backtrace:
0x8a8039 0x8a8610 0x7f5fc8d398f0 0x7f5fc8abe787 0x7f5fc8abf04c 0x6d7863 0x6d10f2 0x7dd347 0x7de8f1 0x647e45 0x64b3de 0x7547a5 0x759ec8 0x8a8fce 0x8bb630 0x7
f5fc8d309ca 0x7f5fc82df70d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a8039]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8610]
/lib/libpthread.so.0(+0xf8f0) [0x7f5fc8d398f0]
/usr/lib/libstdc++.so.6(_ZNSs4_Rep8_M_cloneERKSaIcEm+0x47) [0x7f5fc8abe787]
/usr/lib/libstdc++.so.6(_ZNSsC1ERKSs+0x3c) [0x7f5fc8abf04c]
/usr/bin/mongod(_ZNK5mongo11ReplSetImpl16_summarizeStatusERNS_14BSONObjBuilderE+0x753) [0x6d7863]
/usr/bin/mongod(_ZN5mongo19CmdReplSetGetStatus3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x122) [0x6d10f2]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x227) [0x7dd347]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7de8f1]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x647e45]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x324e) [0x64b3de]
/usr/bin/mongod() [0x7547a5]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b8) [0x759ec8]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib/libpthread.so.0(+0x69ca) [0x7f5fc8d309ca]
/lib/libc.so.6(clone+0x6d) [0x7f5fc82df70d]
Mon Jun 27 10:32:54 Invalid access at address: 0x4
Mon Jun 27 10:32:54 Got signal: 11 (Segmentation fault).
Mon Jun 27 10:32:54 Backtrace:
0x8a8039 0x8a8610 0x7f5fc8d398f0 0x7128f5 0x711052 0x74cdb1 0x876ed7 0x7dd347 0x7de8f1 0x647e45 0x64b3de 0x7547a5 0x759ec8 0x8a8fce 0x8bb630 0x7f5fc8d309ca
0x7f5fc82df70d
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a8039]
/usr/bin/mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8610]
/lib/libpthread.so.0(+0xf8f0) [0x7f5fc8d398f0]
/usr/bin/mongod() [0x7128f5]
/usr/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pb+0x42) [0x711052]
/usr/bin/mongod(_ZN5mongo7Helpers11removeRangeERKSsRKNS_7BSONObjES5_bbPNS0_14RemoveCallbackE+0x791) [0x74cdb1]
/usr/bin/mongod(_ZN5mongo16MoveChunkCommand3runERKSsRNS_7BSONObjERSsRNS_14BSONObjBuilderEb+0x8047) [0x876ed7]
/usr/bin/mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x227) [0x7dd347]
/usr/bin/mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x831) [0x7de8f1]
/usr/bin/mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_10BufBuilderERNS_14BSONObjBuilderEbi+0x35) [0x647e45]
/usr/bin/mongod(ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1+0x324e) [0x64b3de]
/usr/bin/mongod() [0x7547a5]
/usr/bin/mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_8SockAddrE+0x5b8) [0x759ec8]
/usr/bin/mongod(_ZN5mongo10connThreadEPNS_13MessagingPortE+0x21e) [0x8a8fce]
/usr/bin/mongod(thread_proxy+0x80) [0x8bb630]
/lib/libpthread.so.0(+0x69ca) [0x7f5fc8d309ca] /lib/libc.so.6(clone+0x6d) [0x7f5fc82df70d]
EDIT:
This happened right after we started another slave while moveChunk was doing the move. Our second attempt didn't cause seg fault.
|
|
The File IO error appears right AFTER the numa_maps warning. It's still appears in 1.8.2-final
|
|
@Paul Gao: Just want to verify that the File I/O error comes after the numa_maps warning. If so I think it indicates a different problem...
@everyone else: Is there a File I/O error before the numa_maps warning and are you still getting that warning on 1.8.2-final?
|
|
my numa maps is attached.
uname -a
Linux SYS-L121-9-1 2.6.18-164.15.1.el5 #1 SMP Wed Mar 17 11:30:06 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
|
|
output of cat proc/[PID_OF_MONGOD]/numa_maps
uname -a:
Linux lnxp-3741 2.6.18-194.26.1.el5 #1 SMP Fri Oct 29 14:21:16 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
|
|
1.8.2 final version show:
"
-
- WARNING: cannot parse numa_maps
Sat Jun 18 11:32:40 [initandlisten] File I/O errno:29 Illegal seek
"
normal?
|
|
Hmm, neither of those should be failing. Could you fire up mongod from master and if you see the warning about numa_maps attach /proc/[PID_OF_MONGOD]/numa_maps? There could be something odd there that isn't showing up with cat. Also please include the output of "uname -a" so I can see what kernel you are running.
|
|
output of numactl --interleave=all cat /proc/self/numa_maps:
00400000 interleave=0-1 file=/bin/cat mapped=3 N0=3
00604000 interleave=0-1 file=/bin/cat anon=1 dirty=1 mapped=2 N0=2
0d455000 interleave=0-1 heap anon=3 dirty=3 active=0 N0=2 N1=1
33eda00000 interleave=0-1 file=/lib64/ld-2.5.so mapped=23 mapmax=34 N1=23
33edc1b000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
33edc1c000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
33ede00000 interleave=0-1 file=/lib64/libc-2.5.so mapped=65 mapmax=40 N1=65
33edf4e000 interleave=0-1 file=/lib64/libc-2.5.so
33ee14e000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=38 N1=3
33ee152000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
33ee153000 interleave=0-1 anon=4 dirty=4 active=0 N0=2 N1=2
2ac4ca9c4000 interleave=0-1 anon=1 dirty=1 N0=1
2ac4ca9d4000 interleave=0-1 anon=2 dirty=2 N0=1 N1=1
2ac4ca9d6000 interleave=0-1 file=/usr/lib/locale/locale-archive mapped=10 mapmax=33 N0=10
7fff11704000 interleave=0-1 stack anon=3 dirty=3 N0=2 N1=1
|
|
the segfault is fixed in 1.8.2 - but the warning will be put back for 1.8.3
|
|
Here it is :
00400000 default file=/bin/cat mapped=3 N0=3
00604000 default file=/bin/cat anon=1 dirty=1 mapped=2 N0=2
0c11e000 default heap anon=3 dirty=3 active=0 N0=3
3d5a200000 default file=/lib64/ld-2.5.so mapped=24 mapmax=45 N0=24
3d5a41b000 default file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
3d5a41c000 default file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
3d5a600000 default file=/lib64/libc-2.5.so mapped=60 mapmax=49 N0=60
3d5a74e000 default file=/lib64/libc-2.5.so
3d5a94d000 default file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=22 N0=3
3d5a951000 default file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
3d5a952000 default anon=4 dirty=4 active=0 N0=4
2b5ea4273000 default anon=1 dirty=1 N0=1
2b5ea427e000 default anon=2 dirty=2 N0=2
2b5ea4280000 default file=/usr/lib/locale/locale-archive mapped=11 mapmax=6 N0=11
7fffd8179000 default stack anon=3 dirty=3 N0=3
|
|
Yes, the fix now displays that message rather than crashing. Could you attach /proc/self/numa_maps from that box? I'd like to know what format it is using that we can't parse correctly.
|
|
Thanks, this works (mongod starts) !
however there is still an error displayed in addition to the warning.
-
- WARNING: cannot parse numa_maps
Fri Jun 10 11:19:59 [initandlisten] File I/O errno:29 Illegal seek
db version v1.8.3-pre-, pdfile version 4.5
Fri Jun 10 11:19:59 [initandlisten] git version: 9990775e39701870c5238388aecfabda992de8e3
Fri Jun 10 11:19:59 [initandlisten] build sys info: Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41
is this related ?
|
|
Hey Eliot, a little bit up in the comments you can find my numa_maps.
|
|
Here is mine.
|
|
CAn you attach numa_maps?
|
|
Just compiled, seems to work.
throws the can't parse numa maps warning and simply continue to load.
|
|
Thanks, I'll try tomorrow.
|
|
Author:
{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}
Message: fix numa issue SERVER-3212
Branch: v1.8
https://github.com/mongodb/mongo/commit/9990775e39701870c5238388aecfabda992de8e3
|
|
Author:
{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}
Message: fix numa issue SERVER-3212
Branch: master
https://github.com/mongodb/mongo/commit/50b5572b0859da1a6d6f266397dc20f4e5af55f9
|
|
I have commented out the check numa enabled kernel code (with the warn message), recompiled, now mongod runs well.
the stack trace of the crash points to startsWith in version.cpp (which goes to util/goodies.cpp), it's seems as it does not handle the sanity check that well and prevents mongod to initialize well.
How important is that sanity check for numa enabled kernel?
|
|
I confirm that the latest build does not starts properly. As Erez, numactl show that interleaves lines start with 0-1 on our machine
|
|
I can reproduce it on several systems.
I think it comes from not parsing /proc/self/numa_maps.
$ numactl --interleave=all cat /proc/self/numa_maps
00400000 interleave=0-1 file=/bin/cat mapped=3 N0=3
00604000 interleave=0-1 file=/bin/cat anon=1 dirty=1 mapped=2 N0=2
0c3be000 interleave=0-1 heap anon=3 dirty=3 active=0 N0=1 N1=2
3d41a00000 interleave=0-1 file=/lib64/ld-2.5.so mapped=24 mapmax=43 N0=24
3d41c1b000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N1=1
3d41c1c000 interleave=0-1 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
3d41e00000 interleave=0-1 file=/lib64/libc-2.5.so mapped=60 mapmax=48 N0=60
3d41f4e000 interleave=0-1 file=/lib64/libc-2.5.so
3d4214d000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=30 N0=3
3d42151000 interleave=0-1 file=/lib64/libc-2.5.so anon=1 dirty=1 N1=1
3d42152000 interleave=0-1 anon=4 dirty=4 active=0 N0=2 N1=2
2ac82fa3f000 interleave=0-1 anon=1 dirty=1 N1=1
2ac82fa4b000 interleave=0-1 anon=2 dirty=2 N0=1 N1=1
2ac82fa4d000 interleave=0-1 file=/usr/lib/locale/locale-archive mapped=12 mapmax=15 N0=12
7fff54fd8000 interleave=0-1 stack anon=2 dirty=2 N0=1 N1=1
This output is from a server that won't load and crash immediately.
$ numactl --interleave=all cat /proc/self/numa_maps
00400000 interleave=0 file=/bin/cat mapped=3 active=0 N0=3
00604000 interleave=0 file=/bin/cat anon=1 dirty=1 mapped=2 active=1 N0=2
09491000 interleave=0 heap anon=3 dirty=3 active=0 N0=3
35dba00000 interleave=0 file=/lib64/ld-2.5.so mapped=24 mapmax=19 N0=24
35dbc1b000 interleave=0 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
35dbc1c000 interleave=0 file=/lib64/ld-2.5.so anon=1 dirty=1 N0=1
35dbe00000 interleave=0 file=/lib64/libc-2.5.so mapped=60 mapmax=21 N0=60
35dbf4e000 interleave=0 file=/lib64/libc-2.5.so
35dc14d000 interleave=0 file=/lib64/libc-2.5.so anon=1 dirty=1 mapped=3 mapmax=15 N0=3
35dc151000 interleave=0 file=/lib64/libc-2.5.so anon=1 dirty=1 N0=1
35dc152000 interleave=0 anon=4 dirty=4 active=0 N0=4
2b3021fa1000 interleave=0 anon=1 dirty=1 N0=1
2b3021faa000 interleave=0 anon=2 dirty=2 N0=2
2b3021fac000 interleave=0 file=/usr/lib/locale/locale-archive mapped=11 mapmax=10 N0=11
7fff2b126000 interleave=0 stack anon=3 dirty=3 N0=3
This is a working one.
interleave=0 works
interleave=0-1 does not.
parsing error?
i use the latest code.
|
|
Its in the 1.8 nightly:
http://downloads.mongodb.org/linux/mongodb-linux-x86_64-v1.8-latest.tgz
|
|
Thanks for your reply. However it seems the build did not succeed (http://buildbot.mongodb.org/builders/Nightly%20Linux%2064-bit%20v8/builds/565). I'll try again tomorrow and edit this message.
The stack from the current (yesterday) build is :
Thu Jun 9 11:51:02 [initandlisten] MongoDB starting : pid=4615 port=27021 dbpath=/var/db/shd1 64-bit
Thu Jun 9 11:51:02 Invalid access at address: 0x1
Thu Jun 9 11:51:02 Got signal: 11 (Segmentation fault).
Thu Jun 9 11:51:02 Backtrace:
0x8a7a29 0x8a8000 0x3d5ae0eb10 0x3d5a679a10 0x56d6f3 0x8ac592 0x8ad478 0x8b30af 0x3d5a61d994 0x4e10c9
./mongod(_ZN5mongo10abruptQuitEi+0x399) [0x8a7a29]
./mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x220) [0x8a8000]
/lib64/libpthread.so.0 [0x3d5ae0eb10]
/lib64/libc.so.6(strlen+0x10) [0x3d5a679a10]
./mongod(_ZN5mongo13show_warningsEv+0x363) [0x56d6f3]
./mongod(_ZN5mongo14_initAndListenEiPKc+0xf2) [0x8ac592]
./mongod(_ZN5mongo13initAndListenEiPKc+0x18) [0x8ad478]
./mongod(main+0x5acf) [0x8b30af]
/lib64/libc.so.6(__libc_start_main+0xf4) [0x3d5a61d994]
./mongod(__gxx_personality_v0+0x3a1) [0x4e10c9]
Thu Jun 9 11:51:02 dbexit:
Thu Jun 9 11:51:02 [initandlisten] File I/O errno:29 Illegal seek
shutdown: going to close listening sockets...
Thu Jun 9 11:51:02 [initandlisten] shutdown: going to flush diaglog...
Thu Jun 9 11:51:02 [initandlisten] shutdown: going to close sockets...
Thu Jun 9 11:51:02 [initandlisten] shutdown: waiting for fs preallocator...
Thu Jun 9 11:51:02 [initandlisten] shutdown: closing all files...
Thu Jun 9 11:51:02 closeAllFiles() finished
Thu Jun 9 11:51:02 dbexit: really exiting now
|
|
Can you try the 1.8 nightly tomorrow?
|
|
Author:
{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}
Message: try to fix numa segfault SERVER-3212
Branch: v1.8
https://github.com/mongodb/mongo/commit/959500ba9977891b6a6d4736615d1ae5bf22d2c2
|
|
Author:
{u'login': u'erh', u'name': u'Eliot Horowitz', u'email': u'eliot@10gen.com'}
Message: try to fix numa segfault SERVER-3212
Branch: master
https://github.com/mongodb/mongo/commit/c5b9318b6d8100fd264fff49aae4a0e333507b17
|
|
Can you send a stack from 1.8.3-rc3
|
|
same issue with 1.8.2-rc3 (CentOs 5 64bits)
edit : confirmed that it is specific to 1.8.2-rc3 (1.8.1 works).
|
|
I see github history, guess url -> http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-1.8.2.tgz 
|
|
What version is this with?
1.8.2 isn't out yet - so can't be 1.8.2
|
Generated at Thu Feb 08 03:02:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.