[SERVER-1694] Corruption during mapreduce on documents with arrays of binary data. Created: 27/Aug/10 Updated: 12/Jul/16 Resolved: 27/Aug/10 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Client, JavaScript, Shell |
| Affects Version/s: | 1.6.1 |
| Fix Version/s: | 1.7.0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | gvs | Assignee: | Eliot Horowitz (Inactive) |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
cdc$ ./mongo --version cdc$ ./mongod --version cdc$ ./mongos --version cdc$ gcc -v We're running two shards (without replica sets), a config server and a mongos. The corruption occurs on a non-sharded db. |
||
| Attachments: |
|
| Operating System: | Linux |
| Participants: |
| Description |
|
Running mapreduce on an input collection that only has 2 different keys (0x00010203 and 0xaabbccdd) results in random keys in the output collection: cdc$ ./mr , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } Steps to reproduce: 1) insert a few documents with arrays of binary data I'm quite sure the corruption happens during the mapreduce operation, the bindata arrays are stored correctly in the db itself. However, the mongo js shell has some trouble interpreting it consistently: > db.in.findOne( {i : 0}, {k : 1}) , {k : 1}) Again, when accessing the k array via the mongoclient library, the data is NOT corrupted, this bug in the js client could be completely unrelated. I've attached mr.cpp that reproduces the behaviour. |
| Comments |
| Comment by auto [ 15/Sep/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: have to copy BinData in sm in case BSONObj is temp |
| Comment by auto [ 27/Aug/10 ] |
|
Author: {'login': 'erh', 'name': 'Eliot Horowitz', 'email': 'eliot@10gen.com'}Message: have to copy BinData in sm in case BSONObj is temp |
| Comment by Eliot Horowitz (Inactive) [ 27/Aug/10 ] |
|
Ok - found the issue. |
| Comment by gvs [ 27/Aug/10 ] |
|
It's happening a lot less with the precompiled binaries than with the git sources though. Should we try compiling with a different JS engine (like v8)? |
| Comment by Eliot Horowitz (Inactive) [ 27/Aug/10 ] |
|
Ok - about 1/25 times i'm getting a weird key. |
| Comment by gvs [ 27/Aug/10 ] |
|
Have you tried running it multiple times? I downloaded http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-1.6.1.tgz and used those binaries. I'm still linking with the 1.6.1 client library from git though: cdc$ ./mr , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } , "ok" : 1 } |
| Comment by Eliot Horowitz (Inactive) [ 27/Aug/10 ] |
|
Running your program I get: erh@erh-tm1 ~/work/mongo -> g++ -I. mr.cpp -L. -lmongoclient -lpthread -lstdc++ -lboost_system-mt -lboost_thread-mt -lboost_filesystem-mt -lboost_program_options-mt && ./a.out , "ok" : 1 } after changing the print line to: printf("key: %02x%02x%02x%02x %d\n",data[0],data[1],data[2],data[3],o["value"].numberInt()); Can you try this with the official downloads rather than self compiled? Its likely a spidermonkey compilation issue. |