[CSHARP-725] InsertBatch throws duplicate key exception with too much data... (140.000 item, 13MB) but inserts all data correctly Created: 12/Apr/13  Updated: 20/Mar/14  Resolved: 29/Apr/13

Status: Closed
Project: C# Driver
Component/s: None
Affects Version/s: 1.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Serdar Büyüktemiz Assignee: Robert Stam
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

fully updated windows 7 development machine


Issue Links:
Duplicate
duplicates CSHARP-715 InsertBatch fails when large batch ha... Closed

 Description   

In the code i first fill the items to List.
var datas = new List<MyData>(); // 144000 MyData

then call the InsertBatch method.
var _collection = _mongoDatabase.GetCollection<MyData>("MYDATA", WriteConcern.Acknowledged);
_collection.InsertBatch(datas);

this throws these exceptions. if I look at the collection i see all the data inserted correctly. When I try with 10k data it works correctly. The data is clear in all lines. It is probably the size problem but the exception is miss leadling.

Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host.

WriteConcern detected an error 'E11000 duplicate key error index: MYDB.MYDATA.$id dup key: { : ObjectId('51681daf1e72b21b1839877b') }'. (Response was { "err" : "E11000 duplicate key error index: MYDB.MYDATA.$id dup key: { : ObjectId('51681daf1e72b21b1839877b') }", "code" : 11000, "n" : 0, "connectionId" : 19, "ok" : 1.0 }).



 Comments   
Comment by Robert Stam [ 17/Apr/13 ]

I answered the StackOverflow question.

Correct, in my experience the performance difference between batch sizes of 1,000 documents per batch and 1,000,000 per batch will be very small. You could measure it to verify it.

Comment by Serdar Büyüktemiz [ 17/Apr/13 ]

you are right i was using version 1.8.0
i changed the version and it fixed.

thank you very much.

i am not re-trying in my application logic.
can it be the WriteConcern.Acknowledged causing that re-try?

also it would be nice if you can write an answer to this question?
http://stackoverflow.com/questions/16053197/what-is-the-optimum-bulk-item-count-with-insertbatch-method-in-mongodb-c-sharp-d

did i understand right?
that when batch inserting
100 * 1.000 docs vs 1 * 100.000 docs
has nearly no performance diffrence

Comment by Robert Stam [ 15/Apr/13 ]

I see two issues reported above:

1. "message len 875442996 is too large 875442996" in the server logs
2. WriteConcern detected an error 'E11000 duplicate key error' exception thrown by driver

The first issue seems like the same issue reported by CSHARP-715 (and fixed in 1.8.1):

https://jira.mongodb.org/browse/CSHARP-715

One of the characteristics of CSHARP-715 is that all the documents were in fact inserted even though InsertBatch threw an exception, which matches the behavior you are seeing.

Can you double check that you are using v1.8.1 of the driver? It seems odd that you would be hitting this since it is supposedly fixed.

The second issue might or might not be related to the first. Do you do any retries in your application? I'm wondering if you might be catching an exception thrown by the first issue and calling InsertBatch again to retry the operation.

It would be fine if you want to call InsertBatch with smaller batches. That could be one way of working around this issue. Note that the main reason for calling InsertBatch instead of Insert is performance , and once your batch size reaches sizes even as small as a few hundred you should be getting 99% or more of the possible performance gains, so really large batch sizes don't really give any additional performance.

Comment by Serdar Büyüktemiz [ 15/Apr/13 ]

I am leaving the _id empty and letting the driver generate them

here are the db logs...

Mon Apr 15 13:53:06 [conn3] recv(): message len 875442996 is too large875442996

i would like to remind again...
even it gives exceptions the item count is correct in the db...

another question for the bulk inserts
should i devide them to 50k and insert them in a for loop ?

log details...

Mon Apr 15 13:52:38 [conn2] insert MYDB.MYDATA2 keyUpdates:0 locks(micros) w:368032 368ms
Mon Apr 15 13:52:39 [conn2] insert MYDB.MYDATA2 keyUpdates:0 locks(micros) w:360042 360ms
Mon Apr 15 13:52:39 [conn2] insert MYDB.MYDATA2 keyUpdates:0 locks(micros) w:194648 195ms
Mon Apr 15 13:52:39 [conn2] OpCounters::gotOp unknown op: 19376
Mon Apr 15 13:52:39 [conn2] Assertion: 16141:cannot translate opcode 19376
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] ntdll.dll RtlRestoreContext+0x2e2
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] mongod.exe ???
Mon Apr 15 13:52:40 [conn2] kernel32.dll BaseThreadInitThunk+0xd
Mon Apr 15 13:52:40 [conn2] AssertionException handling request, closing client connection: 16141 cannot translate opcode 19376
Mon Apr 15 13:52:41 [initandlisten] connection accepted from 127.0.0.1:6142 #3 (2 connections now open)
Mon Apr 15 13:52:41 [conn3] build index MYDB.X

{ _id: 1 }

Mon Apr 15 13:52:41 [conn3] build index done. scanned 0 total records. 0.001 secs
Mon Apr 15 13:52:42 [conn3] build index MYDB.Y

{ _id: 1 }

Mon Apr 15 13:52:42 [conn3] build index done. scanned 0 total records. 0.001 secs
Mon Apr 15 13:52:43 [conn3] build index MYDB.Z

{ _id: 1 }

Mon Apr 15 13:52:43 [conn3] build index done. scanned 0 total records. 0.001 secs
Mon Apr 15 13:52:58 [FileAllocator] allocating new datafile C:/mongodb/data/MYDB.2, filling with zeroes...
Mon Apr 15 13:52:59 [FileAllocator] done allocating datafile C:/mongodb/data/MYDB.2, size: 256MB, took 0.43 secs
Mon Apr 15 13:53:03 [conn3] build index MYDB.MYDATA

{ _id: 1 }

Mon Apr 15 13:53:03 [conn3] build index done. scanned 0 total records. 0 secs
Mon Apr 15 13:53:03 [conn3] insert MYDB.MYDATA keyUpdates:0 locks(micros) w:473516 473ms
Mon Apr 15 13:53:05 [conn3] insert MYDB.MYDATA keyUpdates:0 locks(micros) w:475588 476ms
Mon Apr 15 13:53:06 [conn3] insert MYDB.MYDATA keyUpdates:0 locks(micros) w:471620 471ms
Mon Apr 15 13:53:06 [conn3] recv(): message len 875442996 is too large875442996
Mon Apr 15 13:53:06 [conn3] end connection 127.0.0.1:6142 (1 connection now open)

Comment by Robert Stam [ 12/Apr/13 ]

It looks like you simply have a duplicate _id value in your MyData array.

How are your _id values getting set? Do you set them yourself when you create the 144,000 MyData values? Or are you leaving the _id empty and letting the driver generate them?

Are there any server side log file entries that coincide with this?

Generated at Wed Feb 07 21:37:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.