[SERVER-9964] mongod crashes when client disconnects (OpenIndiana 157a8) Created: 19/Jun/13  Updated: 26/Nov/13  Resolved: 20/Jun/13

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 2.4.4
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Udo Grabowski Assignee: Tad Marshall
Resolution: Done Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Opensolaris x86-64 (openindiana 157a8) on both own build and smartos build (pthreads,
gcc/g++ 4.6), boost 1.49 from 2.4.4 source-bundle.


Attachments: File gcclibs.tgz    
Issue Links:
Related
related to SERVER-11859 Solaris error: symbol _ZNSt12out_of_r... Closed
Operating System: Solaris
Steps To Reproduce:

start mongod (LANG=C, LC_ALL=C required, otherwise it crashes in localeImpl),
then start mongo (local connection), type exit --> mongod crashes with
segfault.

Participants:

 Description   

When the client disconnects, mongod crashes with a segfault in
mongo::ipToAddrs . The thread sees that the connection has closed
[conn1] Socket recv() conn closed? 127.0.0.1:37593 (ret is zero in recv)
but then tries to do something on the closed connection:

env LC_ALL=C LANG=C dbx ./mongod
Reading mongod
Reading ld.so.1
Reading libsocket.so.1
Reading libresolv.so.2
Reading libnsl.so.1
Reading libdl.so.1
Reading librt.so.1
Reading libstdc++.so.6.0.16
Reading libm.so.2
Reading libgcc_s.so.1
Reading libpthread.so.1
Reading libc.so.1
(dbx) run -vvvvv
Running: mongod -vvvvv 
(process id 6037)
Wed Jun 19 11:09:01.301 BackgroundJob starting: DataFileSync
Wed Jun 19 11:09:01.301 versionArrayTest passed
Wed Jun 19 11:09:01.301 shardKeyTest passed
Wed Jun 19 11:09:01.302 isInRangeTest passed
Wed Jun 19 11:09:01.302 shardObjTest passed
Wed Jun 19 11:09:01.302 Matcher::matches() { abcd: 3.1, abcdef: "123456789" }
Wed Jun 19 11:09:01.302 Matcher::matches() { abcd: 3.1, abcdef: "123456789" }
Wed Jun 19 11:09:01.302 Matcher::matches() { abcd: 3.1, abcdef: "123456789" }
Wed Jun 19 11:09:01.302 Matcher::matches() { abcdef: "z23456789" }
Wed Jun 19 11:09:01.302 Matcher::matches() { abcd: 3.1, abcdef: "123456789" }
Wed Jun 19 11:09:01.302 Matcher::matches() { abcdef: "z23456789" }
Wed Jun 19 11:09:01.305 [initandlisten] MongoDB starting : pid=6037 port=27017 dbpath=/data/db/ 64-bit host=imksunug1
Wed Jun 19 11:09:01.305 [initandlisten] 
Wed Jun 19 11:09:01.305 [initandlisten] ** NOTE: your operating system version does not support the method that MongoDB
Wed Jun 19 11:09:01.305 [initandlisten] **       uses to detect impending page faults.
Wed Jun 19 11:09:01.305 [initandlisten] **       This may result in slower performance for certain use cases
Wed Jun 19 11:09:01.305 [initandlisten] 
Wed Jun 19 11:09:01.305 [initandlisten] db version v2.4.4
Wed Jun 19 11:09:01.305 [initandlisten] git version: nogitversion
Wed Jun 19 11:09:01.305 [initandlisten] build info: SunOS imksunug1 5.11 oi_151a8 i86pc BOOST_LIB_VERSION=1_49
Wed Jun 19 11:09:01.305 [initandlisten] allocator: system
Wed Jun 19 11:09:01.305 [initandlisten] options: { vvvvv: true }
Wed Jun 19 11:09:01.307 [initandlisten] journal dir=/data/db/journal
Wed Jun 19 11:09:01.307 [initandlisten] recover begin
Wed Jun 19 11:09:01.307 [initandlisten] recover lsn: 0
Wed Jun 19 11:09:01.307 [initandlisten] recover /data/db/journal/j._0
Wed Jun 19 11:09:01.307 [initandlisten] mmf open /data/db/local.0
Wed Jun 19 11:09:01.307 [initandlisten] mmf finishOpening 0xffffdd7ffb000000 /data/db/local.0 len:67108864
Wed Jun 19 11:09:01.308 [initandlisten] mmf open /data/db/local.ns
Wed Jun 19 11:09:01.308 [initandlisten] mmf finishOpening 0xffffdd7ff5c00000 /data/db/local.ns len:16777216
Wed Jun 19 11:09:01.333 [initandlisten] mmf close /data/db/local.0
Wed Jun 19 11:09:01.334 [initandlisten] _groupCommit 
Wed Jun 19 11:09:01.334 [initandlisten] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:01.334 [initandlisten] journal REMAPPRIVATEVIEW done startedAt: 0 n:2 0ms
Wed Jun 19 11:09:01.334 [initandlisten] groupCommit end
Wed Jun 19 11:09:01.335 [initandlisten] mmf close /data/db/local.ns
Wed Jun 19 11:09:01.335 [initandlisten] _groupCommit 
Wed Jun 19 11:09:01.335 [initandlisten] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:01.335 [initandlisten] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
Wed Jun 19 11:09:01.335 [initandlisten] groupCommit end
Wed Jun 19 11:09:01.335 [initandlisten] recover cleaning up
Wed Jun 19 11:09:01.335 [initandlisten] removeJournalFiles
Wed Jun 19 11:09:01.335 [initandlisten] removeJournalFiles end
Wed Jun 19 11:09:01.335 [initandlisten] recover done
Wed Jun 19 11:09:02.608 [initandlisten] opening db:  local
Wed Jun 19 11:09:02.609 [initandlisten] mmf open /data/db/local.ns
Wed Jun 19 11:09:02.609 [initandlisten] mmf finishOpening 0xffffdd7ffde00000 /data/db/local.ns len:16777216
Wed Jun 19 11:09:02.609 [initandlisten] mmf open /data/db/local.0
Wed Jun 19 11:09:02.609 [initandlisten] mmf finishOpening 0xffffdd7ff8a00000 /data/db/local.0 len:67108864
Wed Jun 19 11:09:02.609 [initandlisten] mmf close 
Wed Jun 19 11:09:02.612 [initandlisten] enter repairDatabases (to check pdfile version #)
Wed Jun 19 11:09:02.612 [initandlisten]         local
Wed Jun 19 11:09:02.613 [initandlisten] mmf close /data/db/local.0
Wed Jun 19 11:09:02.613 [initandlisten] _groupCommit 
Wed Jun 19 11:09:02.613 [initandlisten] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:02.613 [initandlisten] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
Wed Jun 19 11:09:02.613 [initandlisten] groupCommit end
Wed Jun 19 11:09:02.614 [initandlisten] mmf close /data/db/local.ns
Wed Jun 19 11:09:02.614 [initandlisten] _groupCommit 
Wed Jun 19 11:09:02.614 [initandlisten] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:02.615 [initandlisten] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
Wed Jun 19 11:09:02.615 [initandlisten] groupCommit end
Wed Jun 19 11:09:02.616 [initandlisten] done repairDatabases
Wed Jun 19 11:09:02.617 BackgroundJob starting: snapshot
Wed Jun 19 11:09:02.619 BackgroundJob starting: PeriodicTask::Runner
Wed Jun 19 11:09:02.619 BackgroundJob starting: ClientCursorMonitor
Wed Jun 19 11:09:02.620 BackgroundJob starting: TTLMonitor
Wed Jun 19 11:09:02.620 [initandlisten] runQuery called local.$cmd { create: "startup_log", size: 10485760, capped: true }
Wed Jun 19 11:09:02.621 [initandlisten] run command local.$cmd { create: "startup_log", size: 10485760, capped: true }
Wed Jun 19 11:09:02.621 [initandlisten] opening db:  local
Wed Jun 19 11:09:02.621 [initandlisten] mmf open /data/db/local.ns
Wed Jun 19 11:09:02.621 [initandlisten] mmf finishOpening 0xffffdd7ffd600000 /data/db/local.ns len:16777216
Wed Jun 19 11:09:02.621 [initandlisten] mmf open /data/db/local.0
Wed Jun 19 11:09:02.621 [initandlisten] mmf finishOpening 0xffffdd7ff8200000 /data/db/local.0 len:67108864
Wed Jun 19 11:09:02.621 [initandlisten] mmf close 
Wed Jun 19 11:09:02.622 [initandlisten] create collection local.startup_log { create: "startup_log", size: 10485760, capped: true }
Wed Jun 19 11:09:02.622 [initandlisten] command local.$cmd command: { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0  reslen:75 0ms
Wed Jun 19 11:09:02.622 [initandlisten] info PageFaultRetryableSection will not yield, already locked upon reaching
Wed Jun 19 11:09:02.622 [initandlisten] insert local.startup_log ninserted:1 keyUpdates:0  0ms
Wed Jun 19 11:09:02.623 [initandlisten] fd limit hard:65536 soft:1024 max conn: 819
Wed Jun 19 11:09:02.623 [websvr] fd limit hard:65536 soft:1024 max conn: 819
Wed Jun 19 11:09:02.623 [initandlisten] waiting for connections on port 27017
Wed Jun 19 11:09:02.623 [websvr] admin web console waiting for connections on port 28017
Wed Jun 19 11:09:02.716 [journal] lsn set 0
Wed Jun 19 11:09:02.716 [journal] journal WRITETODATAFILES 1
Wed Jun 19 11:09:02.716 [journal] journal WRITETODATAFILES 2
Wed Jun 19 11:09:02.716 [journal] journal WRITETODATAFILES 0.37ms
Wed Jun 19 11:09:03.637 [journal] _groupCommit 
Wed Jun 19 11:09:03.637 [journal] _groupCommit upgrade
Wed Jun 19 11:09:03.637 [journal] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:03.637 [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
Wed Jun 19 11:09:03.637 [journal] groupCommit end
...
Wed Jun 19 11:09:13.868 [journal] _groupCommit 
Wed Jun 19 11:09:13.868 [journal] _groupCommit upgrade
Wed Jun 19 11:09:13.868 [journal] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:13.868 [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
Wed Jun 19 11:09:13.868 [journal] groupCommit end
Wed Jun 19 11:09:14.338 [initandlisten] connection accepted from 127.0.0.1:37593 #1 (1 connection now open)
Wed Jun 19 11:09:14.339 [conn1] runQuery called admin.$cmd { whatsmyuri: 1 }
Wed Jun 19 11:09:14.339 [conn1] run command admin.$cmd { whatsmyuri: 1 }
Wed Jun 19 11:09:14.340 [conn1] command admin.$cmd command: { whatsmyuri: 1 } ntoreturn:1 keyUpdates:0  reslen:62 0ms
Wed Jun 19 11:09:14.345 [conn1] runQuery called admin.$cmd { getLog: "startupWarnings" }
Wed Jun 19 11:09:14.345 [conn1] run command admin.$cmd { getLog: "startupWarnings" }
Wed Jun 19 11:09:14.345 [conn1] command: { getLog: "startupWarnings" }
Wed Jun 19 11:09:14.345 [conn1] command admin.$cmd command: { getLog: "startupWarnings" } ntoreturn:1 keyUpdates:0  reslen:503 0ms
Wed Jun 19 11:09:14.349 [conn1] runQuery called admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
Wed Jun 19 11:09:14.349 [conn1] run command admin.$cmd { replSetGetStatus: 1.0, forShell: 1.0 }
Wed Jun 19 11:09:14.349 [conn1] command: { replSetGetStatus: 1.0, forShell: 1.0 }
Wed Jun 19 11:09:14.349 [conn1] command admin.$cmd command: { replSetGetStatus: 1.0, forShell: 1.0 } ntoreturn:1 keyUpdates:0  reslen:76 0ms
Wed Jun 19 11:09:14.895 [journal] _groupCommit 
Wed Jun 19 11:09:14.895 [journal] _groupCommit upgrade
Wed Jun 19 11:09:14.895 [journal] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:14.895 [journal] journal REMAPPRIVATEVIEW done startedAt: 1 n:1 0ms
Wed Jun 19 11:09:14.895 [journal] groupCommit end
...
Wed Jun 19 11:09:20.011 [journal] _groupCommit 
Wed Jun 19 11:09:20.011 [journal] _groupCommit upgrade
Wed Jun 19 11:09:20.011 [journal] journal REMAPPRIVATEVIEW
Wed Jun 19 11:09:20.011 [journal] journal REMAPPRIVATEVIEW done startedAt: 0 n:1 0ms
Wed Jun 19 11:09:20.011 [journal] groupCommit end
Wed Jun 19 11:09:20.085 [conn1] Socket recv() conn closed? 127.0.0.1:37593
t@12 (l@12) signal SEGV (no mapping at the fault address) in mongo::ipToAddrs at 0xdd897d
0x0000000000dd897d: ipToAddrs+0x00ed:   addb     %al,(%rax)
(dbx) where
current thread: t@12
=>[1] mongo::ipToAddrs(0x1, 0x1, 0x474e5543432b2b00, 0x332cc30, 0xffffdd7ff81fe6c8, 0x332cc30), at 0xdd897d 
  ---- hidden frames, use 'where -h' to see them all ----
  [3] __cxa_throw(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fffb43ad9 
  [4] mongo::Socket::_handleRecvError(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xde3069 
  [5] mongo::Socket::recv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xde3531 
  [6] mongo::MessagingPort::recv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xddab9e 
  [7] mongo::PortMessageServer::handleIncomingMsg(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xddd6e4 
  [8] thread_proxy(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xe3e0be 
  [9] _thrp_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fff8b7184 
  [10] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fff8b7450 
(dbx) where -h
current thread: t@12
=>[1] mongo::ipToAddrs(0x1, 0x1, 0x474e5543432b2b00, 0x332cc30, 0xffffdd7ff81fe6c8, 0x332cc30), at 0xdd897d 
  [2] _Unwind_RaiseException(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fff8bef05 
  [3] __cxa_throw(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fffb43ad9 
  [4] mongo::Socket::_handleRecvError(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xde3069 
  [5] mongo::Socket::recv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xde3531 
  [6] mongo::MessagingPort::recv(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xddab9e 
  [7] mongo::PortMessageServer::handleIncomingMsg(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xddd6e4 
  [8] thread_proxy(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xe3e0be 
  [9] _thrp_setup(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fff8b7184 
  [10] _lwp_start(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xffffdd7fff8b7450 
(dbx) 



 Comments   
Comment by Tad Marshall [ 26/Jun/13 ]

Thanks for the update! Good luck on your OI projects!

Comment by Udo Grabowski [ 26/Jun/13 ]

Problem has gone away with newest gcc 4.7.3 and libstdc++.so.6.0.17 from OI 151a8
hipster repository. Looks like this was a specific quirk of the gcc 4.6.3 setup from
the OI sfe repository.

Comment by Tad Marshall [ 20/Jun/13 ]

Hi Udo,

That's great news! Thanks for testing it!

I'm going to close this ticket as "Works as designed", but we'd love to hear what you learn from the OpenIndiana GCC folks. You can post updates to this ticket even after it is closed.

Tad

Comment by Udo Grabowski [ 20/Jun/13 ]

Correction: Workaround with Opencsw libraries also works on a7, pfexec did not forward the LD_LIBRARY_PATH in my test.
And this works for both the home compiled and the SmartOS library, so MongoDB is now runnable on OI 151a7 and a8 with
this workaround !

Comment by Udo Grabowski [ 20/Jun/13 ]

> uploaded and see if the symptoms are the same?

Thanks a lot, this sheds some (irritating) light on the problem.
Tried that on a7 (a8 compiled mongod bin), it crashed again with
the same symptom (double checked, it really links to the libs).

BUT: Did the same on a8, and, Jippeeee, it works !!
Both a7 and a8 have libstdc++.so.6.0.16, too....

So I think from this point on it's a problem for the gcc
integrators in OI, there's something really fucked up with
the specs on both a7 and a8. Will switch to the OI developer
board and come back again if the problem has been found
there, or another aspect tunes in that is of importance here.

In the meanwhile, I'll just attach the OCSW libraries to our
inhouse build, so that we can work with it.

Thanks a lot for your assistance, I really appreciate the
intensity and speed you're working (had a hard time to follow
that fast... !

ro sunug1 /install/mongodb-src-r2.4.4 # setenv LD_LIBRARY_PATH /tmp/gcclibs/
ro sunug1 /install/mongodb-src-r2.4.4 # ldd ./mongod
libsocket.so.1 => /lib/64/libsocket.so.1
libresolv.so.2 => /lib/64/libresolv.so.2
librt.so.1 => /lib/64/librt.so.1
libnsl.so.1 => /lib/64/libnsl.so.1
libstdc+.so.6 => /tmp/gcclibs//libstdc+.so.6
libm.so.2 => /lib/64/libm.so.2
libgcc_s.so.1 => /tmp/gcclibs//libgcc_s.so.1
libpthread.so.1 => /lib/64/libpthread.so.1
libc.so.1 => /lib/64/libc.so.1
libmd.so.1 => /lib/64/libmd.so.1
libmp.so.2 => /lib/64/libmp.so.2
ro sunug1 /install/mongodb-src-r2.4.4 # setenv LC_ALL C
ro sunug1 /install/mongodb-src-r2.4.4 # setenv LANG C
ro sunug1 /install/mongodb-src-r2.4.4 # ./mongod
./mongod --help for help and startup options
Thu Jun 20 13:21:40.953 [initandlisten] MongoDB starting : pid=23706
port=27017 dbpath=/data/db/ 64-bit host=imksunug1
Thu Jun 20 13:21:40.953 [initandlisten]
Thu Jun 20 13:21:40.953 [initandlisten] ** NOTE: your operating system
version does not support the method that MongoDB
Thu Jun 20 13:21:40.953 [initandlisten] ** uses to detect
impending page faults.
Thu Jun 20 13:21:40.953 [initandlisten] ** This may result in
slower performance for certain use cases
Thu Jun 20 13:21:40.953 [initandlisten]
Thu Jun 20 13:21:40.953 [initandlisten] db version v2.4.4
Thu Jun 20 13:21:40.953 [initandlisten] git version: nogitversion
Thu Jun 20 13:21:40.953 [initandlisten] build info: SunOS imksunug1 5.11
oi_151a8 i86pc BOOST_LIB_VERSION=1_49
Thu Jun 20 13:21:40.953 [initandlisten] allocator: system
Thu Jun 20 13:21:40.953 [initandlisten] options: {}
Thu Jun 20 13:21:40.988 [initandlisten] journal dir=/data/db/journal
Thu Jun 20 13:21:40.995 [initandlisten] recover begin
Thu Jun 20 13:21:40.996 [initandlisten] recover lsn: 0
Thu Jun 20 13:21:40.996 [initandlisten] recover /data/db/journal/j._0
Thu Jun 20 13:21:41.017 [initandlisten] recover cleaning up
Thu Jun 20 13:21:41.017 [initandlisten] removeJournalFiles
Thu Jun 20 13:21:41.028 [initandlisten] recover done
Thu Jun 20 13:21:41.889 [initandlisten] waiting for connections on port
27017
Thu Jun 20 13:21:41.889 [websvr] admin web console waiting for
connections on port 28017
Thu Jun 20 13:22:18.318 [initandlisten] connection accepted from
127.0.0.1:50251 #1 (1 connection now open)
Thu Jun 20 13:22:20.757 [conn1] end connection 127.0.0.1:50251 (0
connections now open)
^CThu Jun 20 13:23:07.591 [signalProcessingThread] got signal 2
(Interrupt), will terminate after current cmd ends
Thu Jun 20 13:23:07.591 [signalProcessingThread] now exiting
Thu Jun 20 13:23:07.591 dbexit:
Thu Jun 20 13:23:07.591 [signalProcessingThread] shutdown: going to
close listening sockets...
Thu Jun 20 13:23:07.591 [signalProcessingThread] closing listening socket: 8
Thu Jun 20 13:23:07.591 [signalProcessingThread] closing listening socket: 9
Thu Jun 20 13:23:07.591 [signalProcessingThread] closing listening
socket: 10
Thu Jun 20 13:23:07.592 [signalProcessingThread] removing socket file:
/tmp/mongodb-27017.sock
Thu Jun 20 13:23:07.592 [signalProcessingThread] shutdown: going to
flush diaglog...
Thu Jun 20 13:23:07.592 [signalProcessingThread] shutdown: going to
close sockets...
Thu Jun 20 13:23:07.592 [signalProcessingThread] shutdown: waiting for
fs preallocator...
Thu Jun 20 13:23:07.592 [signalProcessingThread] shutdown: lock for
final commit...
Thu Jun 20 13:23:07.592 [signalProcessingThread] shutdown: final commit...
Thu Jun 20 13:23:07.594 [signalProcessingThread] shutdown: closing all
files...
Thu Jun 20 13:23:07.595 [signalProcessingThread] closeAllFiles() finished
Thu Jun 20 13:23:07.595 [signalProcessingThread] journalCleanup...
Thu Jun 20 13:23:07.595 [signalProcessingThread] removeJournalFiles
Thu Jun 20 13:23:07.595 dbexit: really exiting now

Comment by Tad Marshall [ 20/Jun/13 ]

Can you try the libraries from the gcclibs.tgz file I uploaded and see if the symptoms are the same?

Looking at the code, the message "Socket recv() conn closed? 127.0.0.1:37593" is normal at log level 3 or higher; I see the same thing when run with -vvvv. This is also a normal code path for a connection that has been terminated. The code then constructs an exception object and throws a SocketException, which is a normal action in MongoDB for this event. For some reason, you get a segfault during the construction of the exception object, apparently while allocating memory for a string, but this does not happen on my system. This is making me suspicious of the C++ library you are running.

It's just an experiment, but the files I attached are the ones that are working for me in OpenIndiana, so if you get the same problem with these files then at least we will know that it is not due to a difference in libraries.

The libstdc++.so.6.0.16 needs to be symlinked as libstdc++.so.6.

Comment by Tad Marshall [ 20/Jun/13 ]

Two gcc libraries for SunOS 5 derived OSes, extracted from gcc 4.6.3 from OpenCSW:
libstdc++.so.6.0.16
libgcc_s.so.1

Comment by Udo Grabowski [ 20/Jun/13 ]

> as the current version (labeled development in bold letters).
Just for your understanding what is happening here:

OI is currently respun with a larger staff of developers
to get out a new stable release based on gcc instead of
Sunstudio, and a larger number of packages are recompiled,
and some previously uncompilable packages (old g++, non-
standard old Studio CC) shall now be available too,on
this platform. Of course, today, a OS without MongoDB
isn't something that is valuable to run, so I'm trying
to build this (since we are also in need here in our
institute) and place it into the repository as a package.
Since typically, the Solaris stuff is more strict in
Standards (Posix etc., there is currently even a thread
to downgrade standards compliance in some headers to
get some Linux packages compiled...), I always expect to
uncover hidden lazyness in software usually run on Linux
(since Linux gets, unfortunately, more and more lazy
on Standards the last years, and gcc isn't better...),
but, of course, in the OS development process, there can
also be faults on the OS side (which maybe the case
here, we will see). Ideally, both sides will profit
from the integration, as an extended OS support is also
a sign of stability and standards compliance of the
package, and will help you to get even more reputation.

So, sorry if you got accidently involved into a homebrew
problem, but we will see. At least I must say that the
JIRA staff is absolutely helpful with problems, something
that is seen not so often nowadays.. Thank you !

Comment by Udo Grabowski [ 20/Jun/13 ]

In the packagemanager, replace the /dev/ publisher (
File/manage publishers) URL by

<http://pkg.openindiana.org/hipster/>

then press 'Updates', and it will update to
the newest developer chain (currently changing
often due to replacement of SunStudio compiled
stuff with the gcc-compilants).

There you can also add the
<http://pkg.openindiana.org/sfe/> publisher
for the gcc/runtime 4.6.3 package (it may
needs symbolic links in /usr/lib/amd64 to
the /usr/gcc/4.6/lib/amd64/

{libstdc++.so.6,libgcc_s.so.1}

libraries to run mongo without LD_LIBRARY_PATH) .

Comment by Tad Marshall [ 20/Jun/13 ]

Where can I download 157a8? The openindiana.org website shows 151a7 as the current version (labeled development in bold letters).

Comment by Tad Marshall [ 20/Jun/13 ]

Similar results using the 2.4.4 version from mongodb.org, except that the libstdc++.so.6 that it uses by default on this system is too old and doesn't have all the C++ symbols we need:

tad@openindiana:~/mongodev$ curl http://fastdl.mongodb.org/sunos5/mongodb-sunos5-x86_64-2.4.4.tgz -o mongodb-sunos5-x86_64-2.4.4.tgz         
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 72.4M  100 72.4M    0     0  3509k      0  0:00:21  0:00:21 --:--:-- 3513k
tad@openindiana:~/mongodev$ tar -zxvf mongodb-sunos5-x86_64-2.4.4.tgz 
mongodb-sunos5-x86_64-2.4.4/README
mongodb-sunos5-x86_64-2.4.4/THIRD-PARTY-NOTICES
mongodb-sunos5-x86_64-2.4.4/GNU-AGPL-3.0
mongodb-sunos5-x86_64-2.4.4/bin/mongodump
mongodb-sunos5-x86_64-2.4.4/bin/mongorestore
mongodb-sunos5-x86_64-2.4.4/bin/mongoexport
mongodb-sunos5-x86_64-2.4.4/bin/mongoimport
mongodb-sunos5-x86_64-2.4.4/bin/mongostat
mongodb-sunos5-x86_64-2.4.4/bin/mongotop
mongodb-sunos5-x86_64-2.4.4/bin/mongooplog
mongodb-sunos5-x86_64-2.4.4/bin/mongofiles
mongodb-sunos5-x86_64-2.4.4/bin/bsondump
mongodb-sunos5-x86_64-2.4.4/bin/mongoperf
mongodb-sunos5-x86_64-2.4.4/bin/mongod
mongodb-sunos5-x86_64-2.4.4/bin/mongos
mongodb-sunos5-x86_64-2.4.4/bin/mongo
tad@openindiana:~/mongodev$ mongodb-sunos5-x86_64-2.4.4/bin/mongod --dbpath ~/data
Thu Jun 20 05:21:13.698 [initandlisten] MongoDB starting : pid=2921 port=27017 dbpath=/home/tad/data 64-bit host=openindiana
Thu Jun 20 05:21:13.698 [initandlisten] 
Thu Jun 20 05:21:13.698 [initandlisten] ** NOTE: your operating system version does not support the method that MongoDB
Thu Jun 20 05:21:13.698 [initandlisten] **       uses to detect impending page faults.
Thu Jun 20 05:21:13.698 [initandlisten] **       This may result in slower performance for certain use cases
Thu Jun 20 05:21:13.699 [initandlisten] 
Thu Jun 20 05:21:13.699 [initandlisten] db version v2.4.4
Thu Jun 20 05:21:13.699 [initandlisten] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2
Thu Jun 20 05:21:13.699 [initandlisten] build info: SunOS bs-smartos-x86-64-1.10gen.cc 5.11 joyent_20130322T181205Z i86pc BOOST_LIB_VERSION=1_49
Thu Jun 20 05:21:13.699 [initandlisten] allocator: system
Thu Jun 20 05:21:13.700 [initandlisten] options: { dbpath: "/home/tad/data" }
Thu Jun 20 05:21:13.700 [initandlisten] journal dir=/home/tad/data/journal
Thu Jun 20 05:21:13.700 [initandlisten] recover : no journal files present, no recovery needed
Thu Jun 20 05:21:13.791 [initandlisten] waiting for connections on port 27017
Thu Jun 20 05:21:13.791 [websvr] admin web console waiting for connections on port 28017
Thu Jun 20 05:21:43.227 [initandlisten] connection accepted from 127.0.0.1:42845 #1 (1 connection now open)
Thu Jun 20 05:21:54.107 [conn1] end connection 127.0.0.1:42845 (0 connections now open)
^CThu Jun 20 05:22:03.784 [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
Thu Jun 20 05:22:03.784 [signalProcessingThread] now exiting
Thu Jun 20 05:22:03.784 dbexit: 
Thu Jun 20 05:22:03.784 [signalProcessingThread] shutdown: going to close listening sockets...
Thu Jun 20 05:22:03.784 [signalProcessingThread] closing listening socket: 8
Thu Jun 20 05:22:03.784 [signalProcessingThread] closing listening socket: 9
Thu Jun 20 05:22:03.784 [signalProcessingThread] closing listening socket: 10
Thu Jun 20 05:22:03.784 [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
Thu Jun 20 05:22:03.784 [signalProcessingThread] shutdown: going to flush diaglog...
Thu Jun 20 05:22:03.784 [signalProcessingThread] shutdown: going to close sockets...
Thu Jun 20 05:22:03.785 [signalProcessingThread] shutdown: waiting for fs preallocator...
Thu Jun 20 05:22:03.785 [signalProcessingThread] shutdown: lock for final commit...
Thu Jun 20 05:22:03.785 [signalProcessingThread] shutdown: final commit...
Thu Jun 20 05:22:03.787 [signalProcessingThread] shutdown: closing all files...
Thu Jun 20 05:22:03.788 [signalProcessingThread] closeAllFiles() finished
Thu Jun 20 05:22:03.788 [signalProcessingThread] journalCleanup...
ld.so.1: mongod: fatal: relocation error: file mongodb-sunos5-x86_64-2.4.4/bin/mongod: symbol _ZNSt8__detail15_List_node_base7_M_hookEPS0_: referenced symbol not found
Killed
ad@openindiana:~/mongodev$ ldd -r mongodb-sunos5-x86_64-2.4.4/bin/mongod
	libsocket.so.1 =>	 /lib/64/libsocket.so.1
	libresolv.so.2 =>	 /lib/64/libresolv.so.2
	librt.so.1 =>	 /lib/64/librt.so.1
	libnsl.so.1 =>	 /lib/64/libnsl.so.1
	libstdc++.so.6 =>	 /usr/lib/64/libstdc++.so.6
	libm.so.2 =>	 /lib/64/libm.so.2
	libgcc_s.so.1 =>	 /usr/lib/64/libgcc_s.so.1
	libpthread.so.1 =>	 /lib/64/libpthread.so.1
	libc.so.1 =>	 /lib/64/libc.so.1
	libmd.so.1 =>	 /lib/64/libmd.so.1
	libmp.so.2 =>	 /lib/64/libmp.so.2
	symbol not found: _ZNSt8__detail15_List_node_base7_M_hookEPS0_		(mongodb-sunos5-x86_64-2.4.4/bin/mongod)
	symbol not found: _ZNSt8__detail15_List_node_base9_M_unhookEv		(mongodb-sunos5-x86_64-2.4.4/bin/mongod)
	symbol not found: _ZNSt12out_of_rangeD2Ev		(mongodb-sunos5-x86_64-2.4.4/bin/mongod)
	symbol not found: _ZNSt8__detail15_List_node_base11_M_transferEPS0_S1_		(mongodb-sunos5-x86_64-2.4.4/bin/mongod)
	symbol not found: _ZNSt8__detail15_List_node_base4swapERS0_S1_		(mongodb-sunos5-x86_64-2.4.4/bin/mongod)
tad@openindiana:~/mongodev$ 

Making it use the OpenCSW version of libstdc++.so.6 fixes that problem:

tad@openindiana:~/mongodev$ env LD_LIBRARY_PATH=/opt/csw/lib/64 ldd -r mongodb-sunos5-x86_64-2.4.4/bin/mongod
	libsocket.so.1 =>	 /lib/64/libsocket.so.1
	libresolv.so.2 =>	 /lib/64/libresolv.so.2
	librt.so.1 =>	 /lib/64/librt.so.1
	libnsl.so.1 =>	 /lib/64/libnsl.so.1
	libstdc++.so.6 =>	 /opt/csw/lib/64/libstdc++.so.6
	libm.so.2 =>	 /lib/64/libm.so.2
	libgcc_s.so.1 =>	 /opt/csw/lib/64/libgcc_s.so.1
	libpthread.so.1 =>	 /lib/64/libpthread.so.1
	libc.so.1 =>	 /lib/64/libc.so.1
	libmd.so.1 =>	 /lib/64/libmd.so.1
	libmp.so.2 =>	 /lib/64/libmp.so.2
ad@openindiana:~/mongodev$ env LD_LIBRARY_PATH=/opt/csw/lib/64 mongodb-sunos5-x86_64-2.4.4/bin/mongod --dbpath ~/data
Thu Jun 20 05:28:10.123 [initandlisten] MongoDB starting : pid=2943 port=27017 dbpath=/home/tad/data 64-bit host=openindiana
Thu Jun 20 05:28:10.124 [initandlisten] 
Thu Jun 20 05:28:10.124 [initandlisten] ** NOTE: your operating system version does not support the method that MongoDB
Thu Jun 20 05:28:10.124 [initandlisten] **       uses to detect impending page faults.
Thu Jun 20 05:28:10.124 [initandlisten] **       This may result in slower performance for certain use cases
Thu Jun 20 05:28:10.125 [initandlisten] 
Thu Jun 20 05:28:10.125 [initandlisten] db version v2.4.4
Thu Jun 20 05:28:10.125 [initandlisten] git version: 4ec1fb96702c9d4c57b1e06dd34eb73a16e407d2
Thu Jun 20 05:28:10.125 [initandlisten] build info: SunOS bs-smartos-x86-64-1.10gen.cc 5.11 joyent_20130322T181205Z i86pc BOOST_LIB_VERSION=1_49
Thu Jun 20 05:28:10.125 [initandlisten] allocator: system
Thu Jun 20 05:28:10.126 [initandlisten] options: { dbpath: "/home/tad/data" }
Thu Jun 20 05:28:10.126 [initandlisten] journal dir=/home/tad/data/journal
Thu Jun 20 05:28:10.126 [initandlisten] recover begin
Thu Jun 20 05:28:10.127 [initandlisten] info no lsn file in journal/ directory
Thu Jun 20 05:28:10.127 [initandlisten] recover lsn: 0
Thu Jun 20 05:28:10.127 [initandlisten] recover /home/tad/data/journal/j._0
Thu Jun 20 05:28:10.132 [initandlisten] recover cleaning up
Thu Jun 20 05:28:10.136 [initandlisten] removeJournalFiles
Thu Jun 20 05:28:10.136 [initandlisten] recover done
Thu Jun 20 05:28:10.248 [initandlisten] waiting for connections on port 27017
Thu Jun 20 05:28:10.248 [websvr] admin web console waiting for connections on port 28017
Thu Jun 20 05:28:17.714 [initandlisten] connection accepted from 127.0.0.1:57007 #1 (1 connection now open)
Thu Jun 20 05:28:20.481 [conn1] end connection 127.0.0.1:57007 (0 connections now open)
^CThu Jun 20 05:28:24.191 [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
Thu Jun 20 05:28:24.191 [signalProcessingThread] now exiting
Thu Jun 20 05:28:24.191 dbexit: 
Thu Jun 20 05:28:24.191 [signalProcessingThread] shutdown: going to close listening sockets...
Thu Jun 20 05:28:24.191 [signalProcessingThread] closing listening socket: 8
Thu Jun 20 05:28:24.191 [signalProcessingThread] closing listening socket: 9
Thu Jun 20 05:28:24.191 [signalProcessingThread] closing listening socket: 10
Thu Jun 20 05:28:24.191 [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
Thu Jun 20 05:28:24.191 [signalProcessingThread] shutdown: going to flush diaglog...
Thu Jun 20 05:28:24.191 [signalProcessingThread] shutdown: going to close sockets...
Thu Jun 20 05:28:24.191 [signalProcessingThread] shutdown: waiting for fs preallocator...
Thu Jun 20 05:28:24.191 [signalProcessingThread] shutdown: lock for final commit...
Thu Jun 20 05:28:24.191 [signalProcessingThread] shutdown: final commit...
Thu Jun 20 05:28:24.195 [signalProcessingThread] shutdown: closing all files...
Thu Jun 20 05:28:24.196 [signalProcessingThread] closeAllFiles() finished
Thu Jun 20 05:28:24.196 [signalProcessingThread] journalCleanup...
Thu Jun 20 05:28:24.196 [signalProcessingThread] removeJournalFiles
Thu Jun 20 05:28:24.196 dbexit: really exiting now
tad@openindiana:~/mongodev$ 

Comment by Udo Grabowski [ 20/Jun/13 ]

a7 is fine, if it does not crash, we know where to look at.
I've checked the a8 compiled binary on a7, it crashes there
too with the same backtrace.
Virtualbox is a poor performer, you should give it 3G of
memory and at least 2 cores to run Solx OS'ses . The better
option is qemu-kvm (if you have a VT/d and EPT capable
Intel processor), with 4 cores (-smp 4) and 4G (-m 4000m)
RAM you won't even notice that it is a VM...
Except the libjs stuff (see the ominous web page about
the special OI compilation of mongo how to install this)
everything you need should be installable either from
<http://pkg.openindiana.org/dev/>
or from
<http://pkg.openindiana.org/sfe/>

> I can't reproduce your problem. I'm building with g++
> from OpenCSW,in case that matters. My command line was:
> ...

Huh, just got that while writing my mail. Interesting.
Opencsw is different, but since it's the same version,
there must be a more subtle problem in the gcc setup.
I'll try to get a build on a7, in case there's really
something broken on a8. The 4.6.3 gcc on OI is from the
SFE repository. I'll check if I can spot some quirk in
gcc spec files.

Thanks for your fast and invaluable help so far !

Dr.Udo Grabowski Inst.f.Meteorology a.Climate Research IMK-ASF-SAT
www-imk.fzk.de/asf/sat/grabowski/ www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany T+49)721 608-26026 F:-926026

Comment by Tad Marshall [ 20/Jun/13 ]

The build (of just mongod and mongo) finished and I tested.

I can't reproduce your problem. I'm building with g++ from OpenCSW, in case that matters. My command line was:

scons mongod mongo --64 --release --cxx=/opt/csw/bin/g++ --cc=/opt/csw/bin/gcc -j 1

The shell session was:

tad@openindiana:~/mongodev$ ./mongo
MongoDB shell version: 2.5.1-pre-
connecting to: test
Server has startup warnings: 
Thu Jun 20 04:59:53.568 [initandlisten] 
Thu Jun 20 04:59:53.568 [initandlisten] ** NOTE: This is a development version (2.5.1-pre-) of MongoDB.
Thu Jun 20 04:59:53.569 [initandlisten] **       Not recommended for production.
Thu Jun 20 04:59:53.569 [initandlisten] 
> exit
bye
tad@openindiana:~/mongodev$ 

The server side looked like this:

tad@openindiana:~/mongodev$ ./mongod --dbpath ~/data
Thu Jun 20 04:59:53.568 [initandlisten] MongoDB starting : pid=2884 port=27017 dbpath=/home/tad/data 64-bit host=openindiana
Thu Jun 20 04:59:53.568 [initandlisten] 
Thu Jun 20 04:59:53.568 [initandlisten] ** NOTE: This is a development version (2.5.1-pre-) of MongoDB.
Thu Jun 20 04:59:53.569 [initandlisten] **       Not recommended for production.
Thu Jun 20 04:59:53.569 [initandlisten] 
Thu Jun 20 04:59:53.569 [initandlisten] db version v2.5.1-pre-
Thu Jun 20 04:59:53.569 [initandlisten] git version: 714a9c6febdac4d062811cbe09477d662e890fb6
Thu Jun 20 04:59:53.569 [initandlisten] build info: SunOS openindiana 5.11 oi_151a7 i86pc BOOST_LIB_VERSION=1_49
Thu Jun 20 04:59:53.570 [initandlisten] allocator: system
Thu Jun 20 04:59:53.570 [initandlisten] options: { dbpath: "/home/tad/data" }
Thu Jun 20 04:59:53.570 [initandlisten] journal dir=/home/tad/data/journal
Thu Jun 20 04:59:53.571 [initandlisten] recover : no journal files present, no recovery needed
Thu Jun 20 04:59:54.097 [initandlisten] waiting for connections on port 27017
Thu Jun 20 05:00:01.828 [initandlisten] connection accepted from 127.0.0.1:64009 #1 (1 connection now open)
Thu Jun 20 05:00:05.025 [conn1] end connection 127.0.0.1:64009 (0 connections now open)
^CThu Jun 20 05:00:12.001 [signalProcessingThread] got signal 2 (Interrupt), will terminate after current cmd ends
Thu Jun 20 05:00:12.001 [signalProcessingThread] now exiting
Thu Jun 20 05:00:12.001 dbexit: 
Thu Jun 20 05:00:12.001 [signalProcessingThread] shutdown: going to close listening sockets...
Thu Jun 20 05:00:12.001 [signalProcessingThread] closing listening socket: 8
Thu Jun 20 05:00:12.001 [signalProcessingThread] closing listening socket: 9
Thu Jun 20 05:00:12.001 [signalProcessingThread] removing socket file: /tmp/mongodb-27017.sock
Thu Jun 20 05:00:12.001 [signalProcessingThread] shutdown: going to flush diaglog...
Thu Jun 20 05:00:12.001 [signalProcessingThread] shutdown: going to close sockets...
Thu Jun 20 05:00:12.001 [signalProcessingThread] shutdown: waiting for fs preallocator...
Thu Jun 20 05:00:12.001 [signalProcessingThread] shutdown: lock for final commit...
Thu Jun 20 05:00:12.001 [signalProcessingThread] shutdown: final commit...
Thu Jun 20 05:00:12.013 [signalProcessingThread] shutdown: closing all files...
Thu Jun 20 05:00:12.016 [signalProcessingThread] closeAllFiles() finished
Thu Jun 20 05:00:12.016 [signalProcessingThread] journalCleanup...
Thu Jun 20 05:00:12.016 [signalProcessingThread] removeJournalFiles
Thu Jun 20 05:00:12.016 dbexit: really exiting now
tad@openindiana:~/mongodev$ 

The "got signal 2" happened when I pressed ctrl-C.

Do you think I need to try OpenIndiana 157a8? This seems like it might be an OpenIndiana problem, since we seem to work OK with the earlier version.

This is my compiler and build info:

tad@openindiana:~/mongodev$ which g++
/opt/csw/bin/g++
tad@openindiana:~/mongodev$ g++ --version
g++ (GCC) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
tad@openindiana:~/mongodev$ uname -a
SunOS openindiana 5.11 oi_151a7 i86pc i386 i86pc Solaris
tad@openindiana:~/mongodev$ isainfo -kv
64-bit amd64 kernel modules
tad@openindiana:~/mongodev$ ldd -r mongod
	libsocket.so.1 =>	 /lib/64/libsocket.so.1
	libresolv.so.2 =>	 /lib/64/libresolv.so.2
	librt.so.1 =>	 /lib/64/librt.so.1
	libnsl.so.1 =>	 /lib/64/libnsl.so.1
	libstdc++.so.6 =>	 /opt/csw/lib/64/libstdc++.so.6
	libm.so.2 =>	 /lib/64/libm.so.2
	libgcc_s.so.1 =>	 /opt/csw/lib/64/libgcc_s.so.1
	libpthread.so.1 =>	 /lib/64/libpthread.so.1
	libc.so.1 =>	 /lib/64/libc.so.1
	libmd.so.1 =>	 /lib/64/libmd.so.1
	libmp.so.2 =>	 /lib/64/libmp.so.2
tad@openindiana:~/mongodev$ ls -l /opt/csw/lib/64/libstdc++.so.6
lrwxrwxrwx 1 root root 19 May 19 16:16 /opt/csw/lib/64/libstdc++.so.6 -> libstdc++.so.6.0.16
tad@openindiana:~/mongodev$ 

I'll study the code and your stack trace and see if there is a problem I can spot.

Comment by Tad Marshall [ 20/Jun/13 ]

I'm building on OpenIndiana now (running under VirtualBox). The performance in this instance is terrible ... very sluggish. I've seen similar bad performance in the Solaris 11 VM I downloaded from Oracle, but the Solaris 11.1 VM that I created by booting from the .ISO runs at an acceptable speed. Are SunOS 5.11 level OSes just bad performers running virtualized? Is VirtualBox a poor container for Solaris? Perhaps the guest additions are poor.

The OpenIndiana build that I'm testing is 151a7, not the 157a8 that you are using. Should I give up and create a new 157a8 VM? If I don't see the problems you are seeing (when the build eventually finishes), we still won't know if the problem is specific to OpenIndiana build 157a8.

Thoughts?

Comment by Udo Grabowski [ 20/Jun/13 ]

Yes, this is the original 2.4.4 tarball.

Backtrace resolved to source lines, the ffffxxx are system libraries, and the 0xddd1d4 is
ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x414 (which may has been inlined by the
optimizer):

ro sunug1 /install/mongodb-src-r2.4.4 # foreach a ( 0xdecd18 0x95558e 0x95590f 0xffffdd7fff8d7516 0xffffdd7fff8ca09c 0xdd86e8 0xffffdd7fff8def05 0xffffdd7fffb53ad9 0xde2b59 0xde3021 0xdda68e 0xddd1d4 0xe3dbae 0xffffdd7fff8d7184 0xffffdd7fff8d7450)
foreach? echo "$a : " `addr2line $a`
foreach? end
0xdecd18 : /install/mongodb-src-r2.4.4/src/mongo/util/stacktrace.cpp:40
0x95558e : /usr/gcc/4.6/lib/gcc/i386-pc-solaris2.11/4.6.3/../../../../include/c++/4.6.3/sstream:458
0x95590f : /usr/gcc/4.6/lib/gcc/i386-pc-solaris2.11/4.6.3/../../../../include/c++/4.6.3/sstream:439
0xffffdd7fff8d7516 : ??:0
0xffffdd7fff8ca09c : ??:0
0xdd86e8 : /usr/gcc/4.6/lib/gcc/i386-pc-solaris2.11/4.6.3/../../../../include/c++/4.6.3/ext/new_allocator.h:108
0xffffdd7fff8def05 : ??:0
0xffffdd7fffb53ad9 : ??:0
0xde2b59 : /install/mongodb-src-r2.4.4/src/mongo/util/net/sock.cpp:713
0xde3021 : /install/mongodb-src-r2.4.4/src/mongo/util/net/sock.cpp:640
0xdda68e : /install/mongodb-src-r2.4.4/src/mongo/util/net/message_port.cpp:168
0xddd1d4 : ??:?
0xe3dbae : /install/mongodb-src-r2.4.4/src/third_party/boost/libs/thread/src/pthread/thread.cpp:133
0xffffdd7fff8d7184 : ??:0
0xffffdd7fff8d7450 : ??:0

Comment by Tad Marshall [ 19/Jun/13 ]

Can you run addr2line on the list of addresses? And this is 2.4.4 from the source tarball?

Comment by Udo Grabowski [ 19/Jun/13 ]

Checked that, the same problem with ip:port given. It looks to my like this is only
a problem with the exception handling itself. The first question is, whether it should
enter Mongo:Socket:recv (util/net/sock.cpp lines 638-715) at all on a closed connection,
and if it should then throw an exception if ret at line 650 is 0 (=closed). It looks
Mongo:ipToAddrs (util/net/listen.cpp) is called after the handler is triggered (what
for?). So maybe it's a timing issue between exception handler thread and main thread
which works on one OS and not on another ? That should be fixed.

Wed Jun 19 18:43:37.695 Backtrace:
0xdecd18 0x95558e 0x95590f 0xffffdd7fff8d7516 0xffffdd7fff8ca09c 0xdd86e8 0xffffdd7fff8def05 0xffffdd7fffb53ad9 0xde2b59 0xde3021 0xdda68e 0xddd1d4 0xe3dbae 0xffffdd7fff8d7184 0xffffdd7fff8d7450
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo15printStackTraceERSo+0x28 [0xdecd18]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo10abruptQuitEi+0x3ce [0x95558e]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x22f [0x95590f]
/lib/amd64/libc.so.1'__sighndlr+0x6 [0xffffdd7fff8d7516]
/lib/amd64/libc.so.1'call_user_handler+0x2a4 [0xffffdd7fff8ca09c]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo9ipToAddrsEPKcib+0x368 [0xdd86e8]
/lib/amd64/libc.so.1'_SUNW_Unwind_RaiseException+0x55 [0xffffdd7fff8def05]
/usr/gcc/4.6/lib/amd64/libstdc++.so.6.0.16'__cxa_throw+0x59 [0xffffdd7fffb53ad9]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo6Socket16_handleRecvErrorEiiPi+0xb9 [0xde2b59]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo6Socket4recvEPci+0x91 [0xde3021]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x2e [0xdda68e]
/install/mongodb-src-r2.4.4/mongod'_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x414 [0xddd1d4]
/install/mongodb-src-r2.4.4/mongod'thread_proxy+0x7e [0xe3dbae]
/lib/amd64/libc.so.1'_thrp_setup+0xbc [0xffffdd7fff8d7184]
/lib/amd64/libc.so.1'_lwp_start+0x0 [0xffffdd7fff8d7450]

Comment by Tad Marshall [ 19/Jun/13 ]

Here is the info from the Joyent SmartOS builder we use to create the .tgz file on mongodb.org:

[admin@bs-smartos-x86-64-1 ~]$ which g++
/opt/local/bin/g++
[admin@bs-smartos-x86-64-1 ~]$ g++ --version
g++ (GCC) 4.6.2
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
[admin@bs-smartos-x86-64-1 ~]$ uname -a
SunOS bs-smartos-x86-64-1.10gen.cc 5.11 joyent_20130322T181205Z i86pc i386 i86pc Solaris
[admin@bs-smartos-x86-64-1 ~]$ isainfo -kv
64-bit amd64 kernel modules

So, compiler version seems unlikely to be the difference.

Here is the ldd -r output on Solaris 11.1:

tad@solaris:~/mongodev$ ldd -r mongod
	libsocket.so.1 =>	 /lib/64/libsocket.so.1
	libresolv.so.2 =>	 /lib/64/libresolv.so.2
	librt.so.1 =>	 /lib/64/librt.so.1
	libnsl.so.1 =>	 /lib/64/libnsl.so.1
	libstdc++.so.6 =>	 /usr/lib/64/libstdc++.so.6
	libm.so.2 =>	 /lib/64/libm.so.2
	libgcc_s.so.1 =>	 /usr/lib/64/libgcc_s.so.1
	libpthread.so.1 =>	 /lib/64/libpthread.so.1
	libc.so.1 =>	 /lib/64/libc.so.1
	libmd.so.1 =>	 /lib/64/libmd.so.1
	libmp.so.2 =>	 /lib/64/libmp.so.2
	libcryptoutil.so.1 =>	 /lib/64/libcryptoutil.so.1

Here is the ldd -r output on Joyent:

[admin@bs-smartos-x86-64-1 ~/tad/mongo]$ ldd -r mongod
        libsocket.so.1 =>        /lib/64/libsocket.so.1
        libresolv.so.2 =>        /lib/64/libresolv.so.2
        librt.so.1 =>    /lib/64/librt.so.1
        libnsl.so.1 =>   /lib/64/libnsl.so.1
        libstdc++.so.6 =>        /opt/local/lib/amd64/libstdc++.so.6
        libm.so.2 =>     /lib/64/libm.so.2
        libgcc_s.so.1 =>         /opt/local/lib/amd64/libgcc_s.so.1
        libpthread.so.1 =>       /lib/64/libpthread.so.1
        libc.so.1 =>     /lib/64/libc.so.1
        libmd.so.1 =>    /lib/64/libmd.so.1
        libmp.so.2 =>    /lib/64/libmp.so.2

It seems that they are all different. My Solaris has /lib/64/libcryptoutil.so.1, your OpenIndiana has /lib/64/libdl.so.1, Joyent has neither.

Let's assume for now that this is an OpenIndiana issue and I will try to reproduce it. I need to finish setting up OpenIndiana to do that ... I'll let you know if I have problems.

If you want to experiment, the code that is having problems seems to run only when you don't specify the address of the host you are connecting to. You could try

./mongo 127.0.0.1:27017/test

and see if the symptoms change at all.

Comment by Udo Grabowski [ 19/Jun/13 ]

Tried that, compiled without issues, but still the same problem.
The typical compilation flags seen in the build process look like this:

/usr/gcc/4.6/bin/g++ -o
build/sunos5/64/cc_usr_gcc_4.6_bin_gcc/cpppathusr_local_include/cxxusr_gcc_4.6_bin_g++/libpath_usr_local_lib_64/release/mongo/db/fts/fts_matcher_test.o
-c -Wnon-virtual-dtor -Woverloaded-virtual -fPIC -fno-strict-aliasing -ggdb
-pthread -Wall -Wsign-compare -Wno-unknown-pragmas -Winvalid-pch -O3 -m64
-DBOOST_ALL_NO_LIB -D_SCONS -DMONGO_EXPOSE_MACROS -DSUPPORT_UTF8 -D_sunos_
-Du_int64_t=uint64_t -Du_int32_t=uint32_t -D_FILE_OFFSET_BITS=64
-DMONGO_HAVE_HEADER_UNISTD_H -DMONGO_HAVE_EXECINFO_BACKTRACE -I......

Dr.Udo Grabowski Inst.f.Meteorology a.Climate Research IMK-ASF-SAT
www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany T+49)721 608-26026 F:-926026

Comment by Udo Grabowski [ 19/Jun/13 ]

Ok, I'll try that, although the modifications seem not to touch the
problematic sections encountered here. The -Du_int64_t=uint64_t
as well as the changes on lib64 --> lib/64 are still necessary.

Yes.

>
> 2) How did the client close the connection? Is this just an exit
> from the mongo shell, or some other method?

See "Steps to reproduce:": local ./mongo, then exit.

> possibly in combination with poor error handling in MongoDB.
> My understanding is that OpenIndiana strives to be compatible with Solaris,

Since Oracle closed their source after build 148 (which is, actually,
Openindiana 151a), there are possible deviations from there on (Oracle
is somewhere around build 181, while Illumos, the base for all other
Solaris variants except Schillix, develops independently). But since
the SmartOS build is supposed to be working on SmartOS, the chances
that such differences trigger the problem are slim, since SmartOS and
OI build on the Illumos core (on different patch levels, maybe, therefore
I build myself to exclude a possible incompatibility).
Since you used gcc 4.5 and I use gcc 4.6.3, there could also be a
gcc incompatibility ("we break it because we can break it" ....) .
This would at least explain why the SmartOS build works on SmartOS
but not on Openindiana, but it's still unclear what breaks my own
build then. Maybe a problem with multithreading. Maybe the new_allocator
stuff is fed with uninitialized information since the connection
it works on is not available anymore.

Here is the detailed resolver list, nothing unusual:
ug1(~) ldd /install/mongodb-src-r2.4.4/mongod
libsocket.so.1 => /lib/64/libsocket.so.1
libresolv.so.2 => /lib/64/libresolv.so.2
libnsl.so.1 => /lib/64/libnsl.so.1
libdl.so.1 => /lib/64/libdl.so.1
librt.so.1 => /lib/64/librt.so.1
libstdc++.so.6 => /usr/gcc/4.6/lib/amd64/libstdc++.so.6
libm.so.2 => /lib/64/libm.so.2
libgcc_s.so.1 => /usr/gcc/4.6/lib/amd64/libgcc_s.so.1
libpthread.so.1 => /lib/64/libpthread.so.1
libc.so.1 => /lib/64/libc.so.1
libmd.so.1 => /lib/64/libmd.so.1
libmp.so.2 => /lib/64/libmp.so.2

> 127.0.0.1:37593" error you are seeing. It seems likely that the error handling
> for this case is faulty, and that this is the source of the segfault.

This is the debug output with -vvvvv on mongod. Generated in
Mongo:Socket:recv .

It would be helpful if you know some tips how to stop a thread
in gdb where I want to stop it, I still struggle with that, it
seems like it does the Mongo:Socket:recv in another thread than
the one where it's descendant finally crashes, which would be
a bit weird, but dbx and gdb say it crashes in just that thread,
but why the heck it does not stop then ?

Regards,

Dr.Udo Grabowski Inst.f.Meteorology a.Climate Research IMK-ASF-SAT
www.imk-asf.kit.edu/english/sat.php
KIT - Karlsruhe Institute of Technology http://www.kit.edu
Postfach 3640,76021 Karlsruhe,Germany T+49)721 608-26026 F:-926026

Comment by Tad Marshall [ 19/Jun/13 ]

Hi Udo,

Thanks for the update.

I had made a start at creating an OpenIndiana VM (a guest OS in VirtualBox) a while ago, but did not bring it to the point where I had all of the MongoDB build prerequisites installed. If it seems that this problem is specific to OpenIndiana, I can finish setting up the VM and try to reproduce the problem.

The build instructions you linked to are interesting and odd. They do not describe the version of MongoDB where they had to do all these things, but simply point to our "master" branch, which changes every day.

I have been building recently in Solaris 10 and Solaris 11 and also in SmartOS 5.11 in a Joyent instance. I have not had to do any of the steps listed in http://sptci.com/docs/cpp/mongodbOpenIndiana.html .

In Solaris 11.1, the command I use to build a release (optimized) version of MongoDB is:

scons all --64 --release -j 2

None of the source code files are modified from the current master code. Here is some environment info:

tad@solaris:~$ which g++
/usr/bin/g++
tad@solaris:~$ g++ --version
g++ (GCC) 4.5.2
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
tad@solaris:~$ uname -a
SunOS solaris 5.11 11.1 i86pc i386 i86pc
tad@solaris:~$ isainfo -kv
64-bit amd64 kernel modules

When you say

SmartOs build crashed with that problem,

are you refering to the problem described in this Jira ticket?

Not all of my questions have been answered:

1) What is the client connection (e.g. mongo shell, some application using a particular driver)?

2) How did the client close the connection? Is this just an exit from the mongo shell, or some other method?

So far, this seems like it might be a difference between Solaris and OpenIndiana, possibly in combination with poor error handling in MongoDB. My understanding is that OpenIndiana strives to be compatible with Solaris, so I don't think we're supposed to be seeing differences between them.

I'd like to understand how I can generate the "[conn1] Socket recv() conn closed? 127.0.0.1:37593" error you are seeing. It seems likely that the error handling for this case is faulty, and that this is the source of the segfault.

Can you answer my questions regarding the connection, please? Thanks!

Tad

Comment by Udo Grabowski [ 19/Jun/13 ]

Another stacktrace, this time with gdb. I did not manage to get the thread conn1
to stop in Mongo:Socket:recv, for some reason it does not honour the breakpoint,
although I switched to the thread before setting the stop.
At least, now we are a bit deeper in ippToAddr, it seems to struggle in
gcc 4.6.3 new_allocator.h, which has some mysterious comments around the line:
.....
// _GLIBCXX_RESOLVE_LIB_DEFECTS
// 402. wrong new expression in [some_] allocator::construct
void
construct(pointer __p, const _Tp& __val)
{ ::new((void *)_p) _Tp(_val); } // line 108

#ifdef _GXX_EXPERIMENTAL_CXX0X_
template<typename... _Args>
void
construct(pointer __p, _Args&&... __args)
{ ::new((void *)_p) _Tp(std::forward<_Args>(_args)...); }
#endif
.....

Also, the port value seems to be bogus.

GDB stacktrace:
Wed Jun 19 15:18:28.986 [journal] groupCommit end
Wed Jun 19 15:18:29.160 [conn1] Socket recv() conn closed? 127.0.0.1:38235

Program received signal SIGSEGV, Segmentation fault.
[Switching to LWP 12]
0x0000000000dd897d in mongo::ipToAddrs (ips=0x332de70 "�\1777\001", port=26476416, useUnixSockets=96)
at /usr/gcc/4.6/lib/gcc/i386-pc-solaris2.11/4.6.3/../../../../include/c++/4.6.3/ext/new_allocator.h:108
108 { ::new((void *)_p) _Tp(_val); }
(gdb) where
#0 0x0000000000dd897d in mongo::ipToAddrs (ips=0x332de70 "�\1777\001", port=26476416, useUnixSockets=96)
at /usr/gcc/4.6/lib/gcc/i386-pc-solaris2.11/4.6.3/../../../../include/c++/4.6.3/ext/new_allocator.h:108
#1 0xffffdd7fff8bef05 in _Unwind_RaiseException () from /lib/64/libc.so.1
#2 0xffffdd7fffb43ad9 in __cxa_throw () from /usr/gcc/4.6/lib/amd64/libstdc++.so.6
#3 0x0000000000de3069 in mongo::Socket::_handleRecvError (this=0x193ff80, ret=<value optimized out>, len=<value optimized out>, retries=<value optimized out>) at src/mongo/util/net/sock.cpp:713
#4 0x0000000000de3531 in mongo::Socket::recv (this=0x193ff80, buf=0xffffdd7ff81fecf0 "����\177���\224�\212�\177�����2\003", len=4) at src/mongo/util/net/sock.cpp:653
#5 0x0000000000ddab9e in mongo::MessagingPort::recv (this=0x17b21a0, m=@0xffffdd7ff81fed60) at src/mongo/util/net/message_port.cpp:166
warning: (Internal error: pc 0xddd6e3 in read in psymtab, but not in symtab.)

#6 0x0000000000ddd6e4 in mongo::PortMessageServer::handleIncomingMsg (arg=warning: (Internal error: pc 0xddd6e3 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xddd2d0 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0xddd6e3 in read in psymtab, but not in symtab.)

0x329fdd0) at src/mongo/util/net/message_server_port.cpp:196
#7 0x0000000000e3e0be in thread_proxy (param=<value optimized out>) at src/third_party/boost/libs/thread/src/pthread/thread.cpp:121
#8 0xffffdd7fff8b7184 in _thrp_setup () from /lib/64/libc.so.1
#9 0xffffdd7fff8b7450 in smt_pause () from /lib/64/libc.so.1
#10 0x0000000000000000 in ?? ()

Comment by Udo Grabowski [ 19/Jun/13 ]

Build as described in the section 'Build MongoDB' at:
<http://sptci.com/docs/cpp/mongodbOpenIndiana.html>
Left out the already fixed parts, link is a bit outdated.
SmartOs build crashed with that problem, so I thought there
was a library incompatibility between OI and SmartOS, and therefore
did my own build, but crashed again with the same problem.
Additional options apart from the ones above: -m64 -O3
You see -vvvvv debug output, not the usual stuff.
Please read the description, all your other questions are answered there.
Still no idea what could go wrong, and if anybody has the SmartOS build
working (at least on SmartOS itself).
I can issue db commands successfully as long as the server is running,
so currently only this problem (and the minor UTF-8 crash) hinders me
from deploying Mongo on OI.

Comment by Tad Marshall [ 19/Jun/13 ]

Hi Udo,

Thanks for the report, and for the log and dbx output.

Are you able to reproduce this using the Solaris/SmartOS build downloaded from mongodb.org (http://fastdl.mongodb.org/sunos5/mongodb-sunos5-x86_64-2.4.4.tgz)?

Is there any special reason to build your own version; did you change build options? Can you provide the scons command line you used, along with anything else we would need to try to build the same binary you built?

What is the client connection (e.g. mongo shell, some application using a particular driver)? Is the connection from localhost or over a network?

What version of gcc did you use to build it?

How did the client close the connection? Is this just an exit from the mongo shell, or some other method? The message "[conn1] Socket recv() conn closed? 127.0.0.1:37593" is not the usual "connection closed" message (which is something like "[conn1] end connection 127.0.0.1:54959 (0 connections now open)").

Since you have source code and a debugger, are you able to see what is going wrong in the error handling?

Tad

Generated at Thu Feb 08 03:21:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.