[JAVA-4436] Incomplete InsertedIds in InsertManyResult Created: 05/Jan/22  Updated: 28/Oct/23  Resolved: 06/Jan/22

Status: Closed
Project: Java Driver
Component/s: Write Operations
Affects Version/s: 4.0.0
Fix Version/s: 4.4.1

Type: Bug Priority: Major - P3
Reporter: Andrea Pinciroli Assignee: Jeffrey Yemin
Resolution: Fixed Votes: 0
Labels: external-user
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Documentation Changes: Not Needed

 Description   

We have found that calling the insertMany and passing in a list of thousands of documents (i.e. 10.000 documents each one with a size of 10Kb), the returned InsertManyResult object contains incomplete results. 

In particular, by calling the getInsertedIds() method on the result object it reports a list of about 2/3000 documents intead of the expected 10.000.

We found that by using the bulkWrite method on the same list of documents we get the same results with incomplete getInserts() but with correct getInsertedCount().

We are using the java driver version 4.4.0 on a sharded cluster based on 4.4.10 community server.

The problem seems to be related to the automatic splitting of the write operation in many batches.

 



 Comments   
Comment by Andrea Santi [ 04/Oct/22 ]

We had the same exact issue. Upgrading to the latest stable (4.7.1) solved the problem for us.

Comment by Githook User [ 06/Jan/22 ]

Author:

{'name': 'Jeff Yemin', 'email': 'jeff.yemin@mongodb.com', 'username': 'jyemin'}

Message: Ensure insertedIds contain ids from all batches (#850)

The previous code was incorrect because it was comparing absolute write indexes
with indexes that are relative to the current batch. This patch avoids that
by using the insertedId map from SplittablePayload directly, which already
contains absolute write indexes.

JAVA-4436
Branch: 4.4.x
https://github.com/mongodb/mongo-java-driver/commit/3084f18add7511ba67539e653d45be4c648f842d

Comment by Githook User [ 06/Jan/22 ]

Author:

{'name': 'Jeff Yemin', 'email': 'jeff.yemin@mongodb.com', 'username': 'jyemin'}

Message: Ensure insertedIds contain ids from all batches (#850)

The previous code was incorrect because it was comparing absolute write indexes
with indexes that are relative to the current batch. This patch avoids that
by using the insertedId map from SplittablePayload directly, which already
contains absolute write indexes.

JAVA-4436
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/3e5992ea91a821cb7ff89566d2e0527daf8b39b0

Comment by Jeffrey Yemin [ 05/Jan/22 ]

The bug is in the BulkWriteBatch#getInsertedItems method. As part of the fix, we should also confirm and test that this method works properly in the presence of write errors.

Comment by Jeffrey Yemin [ 05/Jan/22 ]

Thanks for the report. I can reproduce with the following test:

        MongoClient client = MongoClients.create();
 
        MongoCollection<Document> coll = client.getDatabase("test").getCollection("JAVA4436");
        coll.drop();
 
        ArrayList<Document> documents = new ArrayList<Document>();
 
        for (int i = 0; i < 44; i++) {
            documents.add(new Document("_id", i).append("b", new byte[1_048_576]));
        }
 
        InsertManyResult result = coll.insertMany(documents);
 
        result.getInsertedIds().keySet().stream().sorted().forEach(System.out::println);

The output is:

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Looking at the command output, the insertMany sends 3 insert commands, and the identifiers from the middle insert command are not included. Playing around with different size inserts, it's clear that all but the first and last batches of documents are not included.

Comment by Andrea Pinciroli [ 05/Jan/22 ]

Same error has been reported here:

MongoDB 4.4, Java driver 4.2.3 - InsertManyResult.getInsertedIds() not returning IDs for all inserted documents

Generated at Thu Feb 08 09:02:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.