Details
-
New Feature
-
Resolution: Unresolved
-
Major - P3
-
None
-
None
-
Heroku Linux
Description
I've been playing around with using a spliterator to process a result set in parallel and have discovered that it really doesn't work how I expect it to work.
What I expect to happen:
- findIterable.spliterator().characteristics() returns SUBSIZED
- findIterable.trySplit() returns a spliterator over records equal to the batch size (if set)
- streams are able to process records in parallel
What I'm finding to happen:
- findIterable.spliterator().characteristics() returns 0
- findIterable.trySplit() returns a spliterator of inconsistent sizing
starting with 1024. The next split gives 2048 results. I'm not sure what the subsequent trySplit() gives as I run out of memory before it returns. - when used with a stream, so far as I can tell, it burns through a large number of batches to fill the first and second sets, processes a few times, then fails with OOM errors once it tries to get the 3rd split
Pseudojava for what I'm trying to accomplish:
FindIterable<Document> allRecords = MongoStore.filterAll(databaseName, collectionName, startDate, endDate);
|
|
|
Spliterator<Document> spliterator = allRecords.spliterator();
|
Stream<Document> docStream = StreamSupport.stream(spliterator, true); |
|
|
docStream.forEach(document -> {
|
// process documents |
});
|
Is this just a case of the spliterator not being implemented in mongo, or am I using it wrong?