[JAVA-245] DBCursor.toArray() should run decodes in thread-pool instead of serially Created: 28/Dec/10  Updated: 21/Sep/16  Resolved: 21/Sep/16

Status: Closed
Project: Java Driver
Component/s: Performance
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Minor - P4
Reporter: Scott Hernandez (Inactive) Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related

 Description   

By using a thread-pool, or more than just a single thread, the decoding time can be greatly decreased. This means the results will be available to clients in a much shorter time.

This could also be done for each batch the cursor retrieves; that might be enough to take care of the issue.



 Comments   
Comment by Eliot Horowitz (Inactive) [ 29/Dec/10 ]

I get it, I've never seen a driver do something like this, though it makes sense.
Another option is to use RawDBObject for read only data.
No up front parsing, but can be slower for other things..

Comment by Scott Hernandez (Inactive) [ 29/Dec/10 ]

Right now when doing batch processing of records in java one of the bottlenecks is the time it takes to decode (and encode) to/from bson. By doing this in a thread-pool it seems like throughput would be pushed way up.

It seems like this is going to the case with any application that wants to reduce the time it takes to get back the results.

Maybe I'm not explaining things well. If I get the time I can put together a fork with some examples.

Comment by Eliot Horowitz (Inactive) [ 29/Dec/10 ]

I didn't mean it wouldn't have to be in the driver, i mean its a pretty app level type of thing to be in a low level driver.

I guess as an option i can be ok.

Comment by Scott Hernandez (Inactive) [ 29/Dec/10 ]

Yeah, it is should not be the default.

It really needs to be done in the driver. It needs to happen when the bson is being decoded into java objects. There is no other place but for the driver to do it.

We could create a holder, like the java.util.concurrent.Future object, so that a factory could be used to do the decoding. The default implementation could just return a synchronous version, giving the same behavior that exists now. We could also provide an async version that runs some of them concurrently in a decoder pool (on multiple threads).

Part of the issue now is that all decoding is done serially, even when you explicitly state that you don't want to use an iterator (like in toArray). On a multi-proc machine it would cut down on the total time to generate the list by using more cpu/cores.

Comment by Eliot Horowitz (Inactive) [ 28/Dec/10 ]

Not sure that's something that it makes sense for a driver to do.
Definitely not the default.

Generated at Thu Feb 08 08:51:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.