[CDRIVER-1344] Any way to deserialize BSON faster? Created: 26/Jun/16 Updated: 11/Sep/19 Resolved: 26/Jun/16 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | json, libbson |
| Affects Version/s: | TBD |
| Fix Version/s: | None |
| Type: | Task | Priority: | Minor - P4 |
| Reporter: | Brandon Ros | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Any |
||
| Description |
|
I unfortunately due to poor design have a MongoDB query that returns 2k+ docs that runs quiet often. The query itself takes less than 10ms, however the client side parsing takes over 300ms. Is there anyway to introduce threading into the BSON deserializing when it is for arrays? |
| Comments |
| Comment by A. Jesse Jiryu Davis [ 27/Jun/16 ] |
|
Brandon, I have not used OpenMP and I can't help diagnose why your code would run slower than you expect. If you write some pure C with the pthread library and do something parallelizable (no locks that block other threads) with each document in a separate thread, you ought to see a speedup on a multiprocessor machine. But everything depends on the code, the algorithm, and the language runtime you're using. |
| Comment by Brandon Ros [ 26/Jun/16 ] |
|
A. Jesse Jirryu, I was referring to the latter method you've described. Which language do you think is best to attempt this in? OpenMP with C proved slower than a single thread. :/ |
| Comment by A. Jesse Jiryu Davis [ 26/Jun/16 ] |
|
Brandon, I believe the answer is "no", there isn't a practical way to use multiple threads to deserialize a single BSON document. BSON's an inherently serial format, and it doesn't provide any jump offsets at the beginning: one thread has to deserialize the whole thing from start to finish in order to discover the location of all elements. Better instead to allocate a set of documents to each thread and parallelize that way, rather than trying to put several threads to work on any one document. |
| Comment by Brandon Ros [ 26/Jun/16 ] |
|
@A. Jesse Jiryu David - It is not that it is particularly slow. I've tried C, Java, and node.js. I was wondering if it could be threaded to any performance benefit. I'm deserializing docs that look like this (an array of 2k at a time for a total of ~6MB): https://gist.github.com/brandonros/169290791aabd1a8570067b44804d46c |
| Comment by A. Jesse Jiryu Davis [ 26/Jun/16 ] |
|
Hi, could you show me a complete code example that I can compile and run that demonstrates the slow performance? |