[SERVER-36151] GPU offloading Created: 16/Jul/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Andrew Shevchuk Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 2
Labels: parallel-processing, parallel-query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Participants:

 Description   

It would be nice to leverage the available and ever growing GPU resources available in machines in some fashion. It makes the most sense for heavy OLAP/Analytical queries – but the biggest internal feature we'd need to add first is support for parallel query processing. Then the single user request can be divided up into N parts and executed in parallel by multiple threads, subsystems, or even [remote] processes. Then building on that, we could determine which parts would naturally perform well on GPU's – things like aggregations, sorting, grouping, etc. – and send that portion of the work to the GPU's using something like the CUDA API. 



 Comments   
Comment by moroine bentefrit [ 05/Aug/20 ]

> the biggest internal feature we'd need to add first is support for parallel query processing

It's actually what I'm looking for. Currently, I have to split my aggregation pipeline into multiple smaller ones in order to speed up aggregation by using all available CPUs. The tradeoff is the result that need to be aggregated again manually in Javascript as I need to group...

 

Do you have any ticket related to this feature which I can follow? I can't find anything related to this.

 

Thanks,

Comment by Piyush Katariya [ 20/Jun/20 ]

Is this considered in any near future release ?

Or at least MongoDB can exploit SIMD Vector Instruction at CPU level for queries and of course for aggregation pipeline. 

Comment by Andrew Shevchuk [ 31/Jul/18 ]

Yes, that's exactly what I thought.

Comment by Matt Lord (Inactive) [ 27/Jul/18 ]

Hi ashevchuk!

Can you elaborate a bit on what you'd like to see here?

In the most abstract sense, of course we'd like to leverage the available GPU resources in some fashion. It makes the most sense for heavy OLAP/Analytical queries – but the biggest internal feature we'd need to add first is support for parallel query processing. Then the single request can be divided up into N parts and executed in parallel by multiple threads, subsystems, or even [remote] processes. Then building on that, we could determine which parts would naturally perform well on GPU's – things like aggregations, sorting, grouping, etc. – and send that portion of the work to the GPU's using something like the CUDA API. 

Is this essentially what you're thinking? If so, then I'll mark this as something we'd eventually like to do. 

Thanks!

Matt

 

 

Generated at Thu Feb 08 04:42:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.