[JAVA-3700] Java driver Out Of Memory, with aggregate to merge Created: 15/Apr/20  Updated: 27/Oct/23  Resolved: 15/Apr/20

Status: Closed
Project: Java Driver
Component/s: Command Operations
Affects Version/s: 4.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Diego Maia Assignee: Jeffrey Yemin
Resolution: Works as Designed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Hi, today we are sustaining an OOM (Out Of Memory) problem when using the driver in version 3.11, I performed an upgrade for testing in version 4.0.2 and I will have the same behavior as OOM.

Whenever we perform an aggregation operation with merge into, according to the query below we have as a return the entire destination collection, that is, we do not have the behavior as exemplified in this link, where it is compared to an SQL statement
https://docs.mongodb.com/manual/reference/operator/aggregation/merge/index.html#comparison-with-out

This is the query that when executed by the following code snippet returns

 

{"$match" : { 
 status:
{ $in:["SUCESSO","REENVIADO"]}
}},
 {$limit : 1 },
 { $merge :
{ into : "my-collection-backup" }
}

 

 

MongoClient client = MongoClients.create("mongodb://localhost:27017");
 MongoDatabase database = client.getDatabase("my-collection");
 MongoCollection<Document> model = database.getCollection(collectionName);
 String query = " ... my query ... ";
 bsonArray = BsonArray.parse(query);
 List<BsonDocument> bsonList = convertBsonArrayToBsonDocumentList(bsonArray);
 AggregateIterable<Document> aggregateResult = model.aggregate(bsonList);
...
 //Exemplified here only the conversion method Bson Array To Bson
 private List<BsonDocument> convertBsonArrayToBsonDocumentList(BsonArray bsonArray) {
 Iterator<BsonValue> it = bsonArray.iterator();
 List<BsonDocument> bsonDocuments = new ArrayList<>();
 while (it.hasNext())
{ BsonValue next = it.next(); BsonDocument bsonDocument = next.asDocument(); bsonDocuments.add(bsonDocument); }
return bsonDocuments;
 }

 

I see that the correct execution of an aggregate with merge should be the same as the execution via mongoshell, which from within the bank, empty or code return, via command line the result is omitted as shown below.

MongoDB server version: 4.2.0

...
> use my-db;
 switched to db my-db
 > db.getCollection("my-collection").aggregate([{"$match" : { status:{$in:["SUCESSO","REENVIADO"]}}}, {$limit : 1 },{ $merge :
{ into : "my-collection-backup" }
}]);
 >
 >

 

Therefore, the fact that we suffer from OOM is due to the improper return of the entire destination collection.

More information: my collections are large, more than 35 thousand records, this is due to the particularity of our customers, and the execution of these queries are in Pod's within the kubernetes.

I emphasize that the theme here is not the lack of resources in our containers, but rather a harmful and incorrect behavior that differs from that performed natively in the Mongo.

 



 Comments   
Comment by Diego Maia [ 15/Apr/20 ]

This is great, clearly, I understanding the issues of incompatibility because we also keep a product here.
But it is wrong with my colleague @jeffrey I disagree with this type of solution.

Well, it looks like I’ll have to do a MacGyver on it.

Comment by Jeffrey Yemin [ 15/Apr/20 ]

Unfortunately, we can't change this behavior without breaking backwards compatibility for existing applications that are depending on this behavior. We added the toCollection method to address the needs of users who don't want to iterate the output collection for $merge/$out-suffixed aggregation pipelines.

My only suggestion is to create a branch point in your application to separate the two cases so that you don't attempt to iterate the results of $merge/$out-suffixed aggregation pipelines. Though inelegant, that is a simple check to add.

Comment by Diego Maia [ 15/Apr/20 ]

My method is generic I execute the aggregation operation and the return I have the AggregateIterable object yet! I make a loop with the results of the return of the aggregation and in this loop, I take the OOM, because as I mentioned I am looping the result. 

The point critical is, the fact here is this difference in behavior, which makes no sense for the driver to return the complete (full) destination collection when using $merge… I think it's wrong, because in the console with native execution in mongo, we have an "empty" result.

My application runs in a container with little memory resource, this is purposeful because my application really has to be small.

Follow below what you requested.

 

MongoClient client = MongoClients.create("mongodb://localhost:27017");
MongoDatabase database = client.getDatabase("my-database");
MongoCollection<Document> model = database.getCollection("my-collection");
String query = " ... my query ... ";
bsonArray = BsonArray.parse(query);
List<BsonDocument> bsonList = convertBsonArrayToBsonDocumentList(bsonArray);
AggregateIterable<Document> aggregateResult = model.aggregate(bsonList);
documentResultList = iterateMongoDocuments(aggregateResult);
 
 
private ArrayNode iterateMongoDocuments(MongoIterable<Document> documents) {
   ArrayNode list = mapper.createArrayNode();
   for (Document document : documents) {
       ObjectNode doc = mapper.createObjectNode();
       for (Map.Entry<String, Object> map : document.entrySet()) {
           if (map.getKey().equalsIgnoreCase("_id") && document.get(map.getKey()) instanceof ObjectId) {
               doc.put(map.getKey(), ((ObjectId) document.get(map.getKey())).toString());
           } else {
               doc.set(map.getKey(), mapper.convertValue(map.getValue(), JsonNode.class));
           }
       }
       list.add(doc);
   }
   return list;
}

 

Comment by Jeffrey Yemin [ 15/Apr/20 ]

What is the application actually doing after the line:

AggregateIterable<Document> aggregateResult = model.aggregate(bsonList);

If that's all it does, the aggregation won't even execute on the server, let alone return the collection to the client, so I gather it's doing something more than that. If you post a full stack trace of the OOM, it will help to see what's going on a bit more clearly.

I suggest that if the application doesn't need to read the contents of the new collection, it uses the toCollection() method on AggregateIterable, e.g

AggregateIterable<Document> aggregateIterable = model.aggregate(bsonList);
aggregateIterable.toCollection();

That will execute the aggregation ending with a $merge stage and return nothing to the client.

Generated at Thu Feb 08 09:00:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.