Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Query Operations
Labels:
- Java8
Environment:
Heroku Linux

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Link:
None
Goal Name(s):
None

I've been playing around with using a spliterator to process a result set in parallel and have discovered that it really doesn't work how I expect it to work.

What I expect to happen:

findIterable.spliterator().characteristics() returns SUBSIZED
findIterable.trySplit() returns a spliterator over records equal to the batch size (if set)
streams are able to process records in parallel

What I'm finding to happen:

findIterable.spliterator().characteristics() returns 0
findIterable.trySplit() returns a spliterator of inconsistent sizing starting with 1024. The next split gives 2048 results. I'm not sure what the subsequent trySplit() gives as I run out of memory before it returns.
when used with a stream, so far as I can tell, it burns through a large number of batches to fill the first and second sets, processes a few times, then fails with OOM errors once it tries to get the 3rd split

Pseudojava for what I'm trying to accomplish:

      FindIterable<Document> allRecords = MongoStore.filterAll(databaseName, collectionName, startDate, endDate);

      Spliterator<Document> spliterator = allRecords.spliterator();
      Stream<Document> docStream = StreamSupport.stream(spliterator, true);

      docStream.forEach(document -> {
        // process documents
      });

Is this just a case of the spliterator not being implemented in mongo, or am I using it wrong?

Assignee:: Unassigned
Reporter:: Charles DuBose
Reviewers:: None
Votes:: 2 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Dec 14 2016 11:58:51 PM UTC
Updated:: Mar 30 2022 11:11:07 PM UTC

Details

Description

Attachments

Activity

People

Dates