-
Type:
New Feature
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Query Operations
-
Environment:Heroku Linux
-
None
-
None
-
None
-
None
-
None
-
None
-
None
I've been playing around with using a spliterator to process a result set in parallel and have discovered that it really doesn't work how I expect it to work.
What I expect to happen:
- findIterable.spliterator().characteristics() returns SUBSIZED
- findIterable.trySplit() returns a spliterator over records equal to the batch size (if set)
- streams are able to process records in parallel
What I'm finding to happen:
- findIterable.spliterator().characteristics() returns 0
- findIterable.trySplit() returns a spliterator of inconsistent sizing
starting with 1024. The next split gives 2048 results. I'm not sure what the subsequent trySplit() gives as I run out of memory before it returns. - when used with a stream, so far as I can tell, it burns through a large number of batches to fill the first and second sets, processes a few times, then fails with OOM errors once it tries to get the 3rd split
Pseudojava for what I'm trying to accomplish:
FindIterable<Document> allRecords = MongoStore.filterAll(databaseName, collectionName, startDate, endDate);
Spliterator<Document> spliterator = allRecords.spliterator();
Stream<Document> docStream = StreamSupport.stream(spliterator, true);
docStream.forEach(document -> {
// process documents
});
Is this just a case of the spliterator not being implemented in mongo, or am I using it wrong?