Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Done
Priority: Critical - P2
Fix Version/s: 1.5.1
Affects Version/s: 1.4.1
Component/s: Stability
Labels:
None
Environment:
fedora 9 x86_64

CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Replica-pair setup: production-shard1-002 and production-shard1-002.silentale.net
Before the first segmentation fault, shard1-002 is the master.

I launch a Map/Reduce command with the Ruby driver. The command fails, I receive this exception: #<Mongo::OperationFailure: map-reduce failed: assertion: invalid utf8>.
Then I launch collection.count, I get a proper result.
And finally, I launch the same Map/Reduce command, but with less data (thanks to a query), and mongod crashes. I know that this last Map/Reduce command behaves correctly when launched on a freshly started server.

Tue Apr 27 14:05:25 CMD: drop random_db_name.tmp.mr.mapreduce_1272377125_2
Tue Apr 27 14:05:25 CMD: drop random_db_name.tmp.mr.mapreduce_1272377125_2_inc
Tue Apr 27 14:05:25 Got signal: 11 (Segmentation fault).
Tue Apr 27 14:05:25 Backtrace:
0x6a8309 0x33042322a0 0x7400f9 0x7a02c3 0x79dbbb 0x71388b 0x713b5a 0x714c4f 0x714bdc 0x714ac5 0x54d691 0x54f433 0x649bdc 0x67e441 0x67f347 0x55662f 0x55a6cc 0x5f310a 0x5f8137 0x6a9894

The sad part is that I did not notice that the master crashed (auto-reconnect), and I crashed the second pair right after

I attached the logs of the two servers, and the Map/Reduce code in a Ruby file.

I can reproduce the problem on my laptop (MongoDB 1.4.1 64bits OSX) with the same collection, and a single mongod instance. But it does not raise a segmentation fault, mongod just gently kills itself.

Tue Apr 27 18:19:09 connection accepted from 127.0.0.1:60353 #1
Tue Apr 27 18:19:09 connection accepted from 127.0.0.1:60354 #2
Tue Apr 27 18:19:09 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1
Tue Apr 27 18:19:09 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1_inc
decode failed. probably invalid utf-8 string ["c?cilegigi63"@orange.fr]
why: TypeError: malformed UTF-8 character sequence at offset 2
Tue Apr 27 18:19:10 mr failed, removing collection
Tue Apr 27 18:19:10 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1
Tue Apr 27 18:19:10 CMD: drop random_db_name.tmp.mr.mapreduce_1272385149_1_inc
Tue Apr 27 18:19:10 query random_db_name.$cmd ntoreturn:1 command reslen:89 641ms
Tue Apr 27 18:19:15 CMD: drop random_db_name.tmp.mr.mapreduce_1272385155_2
Tue Apr 27 18:19:15 CMD: drop random_db_name.tmp.mr.mapreduce_1272385155_2_inc
Assertion failure: !oldfp->dormantNext, at jsinterp.c:1620
Tue Apr 27 18:19:15 Got signal: 6 (Abort trap).
Tue Apr 27 18:19:15 Backtrace:
0x10021983c 0x7fff800693fa 0x1f6 0x10021a1b5 0x10021f21d 0x10021f56d 0x10022066f 0x100000b74
0 mongod 0x000000000021983c _ZN5mongo10abruptQuitEi + 332
1 libSystem.B.dylib 0x00000000800693fa _sigtramp + 26
2 ??? 0x00000000000001f6 0x0 + 502
3 mongod 0x000000000021a1b5 _ZN5mongo6listenEi + 597
4 mongod 0x000000000021f21d _ZN5mongo14_initAndListenEiPKc + 1693
5 mongod 0x000000000021f56d _ZN5mongo13initAndListenEiPKc + 29
6 mongod 0x000000000022066f main + 3583
7 mongod 0x0000000000000b74 start + 52
Tue Apr 27 18:19:15 dbexit:
Tue Apr 27 18:19:15 shutdown: going to close listening sockets...
Tue Apr 27 18:19:15 going to close listening socket: 7
Tue Apr 27 18:19:15 Listener on port 28017 aborted.
Tue Apr 27 18:19:15 going to close listening socket: 8
Tue Apr 27 18:19:15 shutdown: going to flush oplog...
Tue Apr 27 18:19:15 shutdown: going to close sockets...
Tue Apr 27 18:19:15 shutdown: waiting for fs preallocator...
Tue Apr 27 18:19:15 MessagingPort recv() errno:9 Bad file descriptor 127.0.0.1:60353
Tue Apr 27 18:19:15 shutdown: closing all files...
Tue Apr 27 18:19:15 end connection 127.0.0.1:60353
Tue Apr 27 18:19:15 closeAllFiles() finished
Tue Apr 27 18:19:15 shutdown: removing fs lock...
Tue Apr 27 18:19:15 dbexit: really exiting now

So I suppose that this problems appears as soon as a previous map/reduce command fails.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mr_function.rb
Apr 27 2010 01:00:53 PM UTC
0.8 kB
Nicolas Fouché
shard1-001.txt
Apr 27 2010 01:00:53 PM UTC
28 kB
Nicolas Fouché
shard1-002.txt
Apr 27 2010 01:00:53 PM UTC
5 kB
Nicolas Fouché

Assignee:: Eliot Horowitz (Inactive)
Reporter:: Nicolas Fouché
Participants:: auto, Eliot Horowitz, Nicolas Fouché
Votes:: 3 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Apr 27 2010 01:00:53 PM UTC
Updated:: Jul 12 2016 12:28:02 AM UTC
Resolved:: Apr 28 2010 03:06:09 PM UTC

Details

Description

Attachments

Attachments

Activity

People

Dates