[SERVER-10004] Problem with streaming data into the database Created: 24/Jun/13  Updated: 17/Jul/13  Resolved: 28/Jun/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Henrik Baastrup Assignee: Michael Grundy
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MongoDb version 2.4.1 on a CentOS 6.0


Attachments: Java Source File CdrElement.java     Java Source File CdrElementMongoDbStorageHandler.java     File Failed.bson     Java Source File Main.java     File MongoTest.tgz     File src.tgz    
Operating System: Linux
Steps To Reproduce:

I have attached the part of my program there communicate with the database. I have not attached the data I use as we are talking about 300Mb in BSON format, but if necessary I'm ready to supply them.

Participants:

 Description   

I'm streaming data in to the MongoDB on a socket using the MongoDB Wire Protocol. It seems like the DB lose the synchronisme, and I have to close and re-open the socket. This happen about every 2000 record I try to insert.
I get the following types of errors in the log:
19:26:27.129 [initandlisten] connection accepted from 10.190.35.153:57149 #1729 (2 connections now open)
19:26:27.629 [initandlisten] connection accepted from 10.190.35.153:57150 #1730 (3 connections now open)
19:26:59.319 [conn1729] Assertion: 10307:Client Error: bad object in message: bson length doesn't match what we found
19:26:59.322 [conn1729] insert cdr.things keyUpdates:0 exception: Client Error: bad object in message: bson length doesn't match what we found code:10307 0ms
19:26:59.322 [conn1729] recv(): message len 959525686 is too large. Max is 48000000
19:26:59.322 [conn1729] end connection 10.190.35.153:57149 (2 connections now open)

19:27:28.660 [initandlisten] connection accepted from 10.190.35.153:57153 #1733 (3 connections now open)
19:27:29.184 [conn1733] OpCounters::gotOp unknown op: 26740
19:27:29.184 [conn1733] Assertion: 16141:cannot translate opcode 26740
19:27:29.187 [conn1733] AssertionException handling request, closing client connection: 16141 cannot translate opcode 26740



 Comments   
Comment by Michael Grundy [ 28/Jun/13 ]

Hi Henrik -

You can create a SERVER ticket requesting UDP connection support. I couldn't guarantee that it will get on the schedule anytime soon, but it would register your and anyone else's interest in that feature.

If you are adept at c++, I encourage you to try modifying to the code to support a UDP connection. A proof of concept would certainly be interesting to see.

Thanks!
Mike

Comment by Henrik Baastrup [ 28/Jun/13 ]

I think I found the problem and I am quite sure that it is NOT on the MongoDB size, I have downloaded the MongoDB source, and yes, it looks very nice!
It seams the problem is I'm using a non-blocking Java socket (nio channel). When I run my socket in blocking mode, I get all records inserted correctly. I'm not sure if this is a Java problem in general, but I do stress my socket a lot the way I run the program.
See the:

CdrElementMongoDbStorageHandler.setupSocketChannel()

method.
I apologize for the wrong bug report and want to tank Michael for all the help he have given me.

Last I would like to mention that using a TCP connection is a limitation for inserting records fast, a profiling of my program will sow that the most of the time is spend on sending data over the socket. It would be nice with a UDP connection for very fast insert of huge amount of data, that is what I address.

Comment by Michael Grundy [ 27/Jun/13 ]

The problem doesn't lie in in your event objects, they seem reasonable. Lower down the sendMongoBuffer() method is probably the issue. In addition to the assert, we get this message in the log:

Thu Jun 27 12:45:27.122 [conn23] recv(): message len 942944814 is too large. Max is 48000000

That message length is most likely garbage.

Also, the counts your program thinks it has inserted do not match the document count in the collection afterwards.

I recommend you compare what you think your app is doing to what is really being sent over the wire (tshark, tcpdump etc)

Thanks!
Mike

Comment by Henrik Baastrup [ 27/Jun/13 ]

New changed source. The CdrElementMongoDbStorageHandler class contain the logic to sent data to the MongoDB and store failed records to a file

Comment by Henrik Baastrup [ 27/Jun/13 ]

BSON file with records there failed to be inserted

Comment by Henrik Baastrup [ 27/Jun/13 ]

First: I have already looked a lot on the Java driver, I even used it in the beginning. I chose to write my own, because I'm going for speed my objective is get 500000 records inserted at second and would like to rich 1000000 recodes at seconds! (it would be nice if MongoDB had an UDP connection)

Second: I do not say that it is not my program there fails. But I opened the bug because it looks like; when I resent the same data after having re-connected I get them in. Now I do not know if the error is on the actual record or the record before, I get an IOException and no bytes sent when I send the actual record, so the database might have closed the connect on the previous record.
To verify this, and if there are errors in my BSON format, I changed my program, so it store the actual record buffer and the previous in a file, I now load the file direct into the database with the command:

mongorestore -d cdr -c things Failed.bson

and the file load with the following output:

Thu Jun 27 11:27:44.244 Failed.bson
Thu Jun 27 11:27:44.244 	going into namespace [cdr.things]
Thu Jun 27 11:27:44.244 Failed.metadata.json not found. Skipping.
Thu Jun 27 11:27:44.244 warning: Restoring to cdr.things without dropping. Restored data will be inserted without raising errors; check your server log
220 objects found

My program told me that I stored 220 records and running the following command I also get 220:

> db.runCommand( {count: 'things'} )
{ "n" : 220, "ok" : 1 }

To me this prove that my program do format the data correct.

I will attach my data file and my new sources to the bug

Comment by Michael Grundy [ 26/Jun/13 ]

I ran the test program on the latest version of mongod (2.4.4) and 2.4.1 as you did. The logs have some better information about what is going on. This example is the first error encountered on one run (from 2.4.4):

Wed Jun 26 14:57:25.077 [conn2] Assertion: 10307:Client Error: bad object in message: bson length doesn't match what we found
0xdd2331 0xd93c6b 0xd941ac 0x75c209 0x9f0c56 0x9f7cae 0x6e7978 0xdbea9e 0x3870407851 0x38700e890d
 /usr/bin/mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x21) [0xdd2331]
 /usr/bin/mongod(mongo::msgasserted(int, char const*)+0x9b) [0xd93c6b]
 /usr/bin/mongod() [0xd941ac]
 /usr/bin/mongod(mongo::DbMessage::nextJsObj()+0x249) [0x75c209]
 /usr/bin/mongod(mongo::receivedInsert(mongo::Message&, mongo::CurOp&)+0x206) [0x9f0c56]
 /usr/bin/mongod(mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)+0xabe) [0x9f7cae]
 /usr/bin/mongod(mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)+0x98) [0x6e7978]
 /usr/bin/mongod(mongo::PortMessageServer::handleIncomingMsg(void*)+0x42e) [0xdbea9e]
 /lib64/libpthread.so.0() [0x3870407851]
 /lib64/libc.so.6(clone+0x6d) [0x38700e890d]
Wed Jun 26 14:57:25.081 [conn2] insert cdr.things keyUpdates:0 exception: Client Error: bad object in message: bson length doesn't match what we found code:10307  0ms
Wed Jun 26 14:57:25.081 [conn2] OpCounters::gotOp unknown op: 69
Wed Jun 26 14:57:25.081 [conn2] Assertion: 16141:cannot translate opcode 69
0xdd2331 0xd93c6b 0xd941ac 0x9f8361 0x6e7978 0xdbea9e 0x3870407851 0x38700e890d
 /usr/bin/mongod(mongo::printStackTrace(std::basic_ostream<char, std::char_traits<char> >&)+0x21) [0xdd2331]
 /usr/bin/mongod(mongo::msgasserted(int, char const*)+0x9b) [0xd93c6b]
 /usr/bin/mongod() [0xd941ac]
 /usr/bin/mongod(mongo::assembleResponse(mongo::Message&, mongo::DbResponse&, mongo::HostAndPort const&)+0x1171) [0x9f8361]
 /usr/bin/mongod(mongo::MyMessageHandler::process(mongo::Message&, mongo::AbstractMessagingPort*, mongo::LastError*)+0x98) [0x6e7978]
 /usr/bin/mongod(mongo::PortMessageServer::handleIncomingMsg(void*)+0x42e) [0xdbea9e]
 /lib64/libpthread.so.0() [0x3870407851]
 /lib64/libc.so.6(clone+0x6d) [0x38700e890d]
Wed Jun 26 14:57:25.083 [conn2] AssertionException handling request, closing client connection: 16141 cannot translate opcode 69

On a full run of your test program, we get an assortment of Asserts:

Assertion: 10307:Client Error: bad object in message: bson length doesn't match what we found
Assertion: 10307:Client Error: bad object in message: not null terminate string
Assertion: 10307:Client Error: bad object in message: no end of c-string
Assertion: 10307:Client Error: bad object in message: invalid bson
Assertion: 16141:cannot translate opcode 17682

It looks like your program corrupts its buffer periodically and somewhat randomly (line of first failure is not the same between runs). This leads to malformed BSON which mongod responds to by hanging up the client connection.

I think you should give our Java driver a shot and see if you have similar issues.

Edit: I did get the asserts on 2.4.1, not just 2.4.4

Comment by Henrik Baastrup [ 25/Jun/13 ]

An other note: If a record fails to insert because the database drop the connection, on re-connect the record is inserted. That was the reason I suspect the database and not my program. Of-cause I might be wrong!

Comment by Henrik Baastrup [ 25/Jun/13 ]

There exist a bug in the tgz archive I already attached, please overload the Main.java file with this one.

Comment by Henrik Baastrup [ 25/Jun/13 ]

Test program with data attached. Fell free to contact me if you need help to understand whats going on.

Comment by Henrik Baastrup [ 25/Jun/13 ]

I have data and a small program in Java there transform them and insert in the Database ready. It is about 68 MB in all, I will try tot attach it to this issue.
Change the src/hba/Main.java so it fit your system and run Ant.
To lunch the program do: java -Xms4096m -jar MongoTest.jar
The program read some CSV files transform them and insert the data into the MongoDB.
Note: Most of the data are inserted correctly and the data there fails are no the same between runs.

Comment by Scott Hernandez (Inactive) [ 24/Jun/13 ]

Do you have sample data and something we can run to reproduce this?

Based on the errors it seems like you are sending bad data to the server.

Generated at Thu Feb 08 03:22:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.