[JAVA-2410] Create custom Spliterator implementation for MongoIterable Created: 14/Dec/16 Updated: 30/Mar/22 |
|
| Status: | Backlog |
| Project: | Java Driver |
| Component/s: | Query Operations |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Charles DuBose | Assignee: | Unassigned |
| Resolution: | Unresolved | Votes: | 2 |
| Labels: | Java8 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Heroku Linux |
||
| Description |
|
I've been playing around with using a spliterator to process a result set in parallel and have discovered that it really doesn't work how I expect it to work. What I expect to happen:
What I'm finding to happen:
Pseudojava for what I'm trying to accomplish:
Is this just a case of the spliterator not being implemented in mongo, or am I using it wrong? |
| Comments |
| Comment by Ian Whalen (Inactive) [ 07/Jan/19 ] | |||
|
Not certain that we want to do this, but will revisit in the post 4.0 time frame. | |||
| Comment by Charles DuBose [ 15/Dec/16 ] | |||
|
I think there's value in splitting on the batch. Each trySplit could return the next batch, retrieved from the cursor. Once the batch is realized, then it can be processed in parallel. Say you set a batch size of 50 and are running 4 streams. The stream would pull the cursor 4 times, process the records, then pull the cursor again each time the batch finishes processing. Mongo itself doesn't need to be parallel, just provide a sane split for the actual parallel stream. I could probably write something custom to do the same thing, but I'm not certain that the FindIterator provides that level of control over pulling each batch. I guess I could do it the old-fashioned way, push e.next into an array batchSize times, return that as the trySplit result, keep that up until there's nothing left. | |||
| Comment by Jeffrey Yemin [ 15/Dec/16 ] | |||
|
The driver does not implement any special support for Spliterator, so your application is ending up with the default implementation from java.lang.Iterable, which returns
I'm not all that familiar with Spliterator implementations, but it strikes me that supporting a parallel Spliterator in the driver may not be possible, as MongoDB cursors return batches one at a time. So in terms of memory usage, it would be no better than, say:
This will require more research, but I'm interested in any initial feedback you may have. |