Replica-pair setup: production-shard1-002 and production-shard1-002.silentale.net
Before the first segmentation fault, shard1-002 is the master.
I launch a Map/Reduce command with the Ruby driver. The command fails, I receive this exception: #<Mongo::OperationFailure: map-reduce failed: assertion: invalid utf8>.
Then I launch collection.count, I get a proper result.
And finally, I launch the same Map/Reduce command, but with less data (thanks to a query), and mongod crashes. I know that this last Map/Reduce command behaves correctly when launched on a freshly started server.
Tue Apr 27 14:05:25 CMD: drop random_db_name.tmp.mr.mapreduce_1272377125_2
Tue Apr 27 14:05:25 CMD: drop random_db_name.tmp.mr.mapreduce_1272377125_2_inc
Tue Apr 27 14:05:25 Got signal: 11 (Segmentation fault).
Tue Apr 27 14:05:25 Backtrace:
0x6a8309 0x33042322a0 0x7400f9 0x7a02c3 0x79dbbb 0x71388b 0x713b5a 0x714c4f 0x714bdc 0x714ac5 0x54d691 0x54f433 0x649bdc 0x67e441 0x67f347 0x55662f 0x55a6cc 0x5f310a 0x5f8137 0x6a9894
The sad part is that I did not notice that the master crashed (auto-reconnect), and I crashed the second pair right after
I attached the logs of the two servers, and the Map/Reduce code in a Ruby file.
I can reproduce the problem on my laptop (MongoDB 1.4.1 64bits OSX) with the same collection, and a single mongod instance. But it does not raise a segmentation fault, mongod just gently kills itself.
Tue Apr 27 18:19:09 connection accepted from 127.0.0.1:60353 #1
Tue Apr 27 18:19:09 connection accepted from 127.0.0.1:60354 #2
Tue Apr 27 18:19:09 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1
Tue Apr 27 18:19:09 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1_inc
decode failed. probably invalid utf-8 string ["c?cilegigi63"@orange.fr]
why: TypeError: malformed UTF-8 character sequence at offset 2
Tue Apr 27 18:19:10 mr failed, removing collection
Tue Apr 27 18:19:10 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1
Tue Apr 27 18:19:10 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1_inc
Tue Apr 27 18:19:10 query random_db_name.$cmd ntoreturn:1 command reslen:89 641ms
Tue Apr 27 18:19:15 CMD: drop random_db_name.tmp.mr.mapreduce_1272385155_2
Tue Apr 27 18:19:15 CMD: drop random_db_name.tmp.mr.mapreduce_1272385155_2_inc
Assertion failure: !oldfp->dormantNext, at jsinterp.c:1620
Tue Apr 27 18:19:15 Got signal: 6 (Abort trap).
Tue Apr 27 18:19:15 Backtrace:
0x10021983c 0x7fff800693fa 0x1f6 0x10021a1b5 0x10021f21d 0x10021f56d 0x10022066f 0x100000b74
0 mongod 0x000000000021983c _ZN5mongo10abruptQuitEi + 332
1 libSystem.B.dylib 0x00000000800693fa _sigtramp + 26
2 ??? 0x00000000000001f6 0x0 + 502
3 mongod 0x000000000021a1b5 _ZN5mongo6listenEi + 597
4 mongod 0x000000000021f21d _ZN5mongo14_initAndListenEiPKc + 1693
5 mongod 0x000000000021f56d _ZN5mongo13initAndListenEiPKc + 29
6 mongod 0x000000000022066f main + 3583
7 mongod 0x0000000000000b74 start + 52
Tue Apr 27 18:19:15 dbexit:
Tue Apr 27 18:19:15 shutdown: going to close listening sockets...
Tue Apr 27 18:19:15 going to close listening socket: 7
Tue Apr 27 18:19:15 Listener on port 28017 aborted.
Tue Apr 27 18:19:15 going to close listening socket: 8
Tue Apr 27 18:19:15 shutdown: going to flush oplog...
Tue Apr 27 18:19:15 shutdown: going to close sockets...
Tue Apr 27 18:19:15 shutdown: waiting for fs preallocator...
Tue Apr 27 18:19:15 MessagingPort recv() errno:9 Bad file descriptor 127.0.0.1:60353
Tue Apr 27 18:19:15 shutdown: closing all files...
Tue Apr 27 18:19:15 end connection 127.0.0.1:60353
Tue Apr 27 18:19:15 closeAllFiles() finished
Tue Apr 27 18:19:15 shutdown: removing fs lock...
Tue Apr 27 18:19:15 dbexit: really exiting now
So I suppose that this problems appears as soon as a previous map/reduce command fails.