[SERVER-676] use multiple cores for index sort-phase Created: 25/Feb/10  Updated: 07/Dec/23  Resolved: 20/Nov/23

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Dwight Merriman Assignee: Backlog - Storage Execution Team
Resolution: Won't Do Votes: 41
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-14154 Multi-threaded index creation Closed
Related
related to SERVER-81568 Batch Bulk inserts during the indexin... Open
related to SERVER-83953 Use parallelism to speed up index builds Backlog
related to SERVER-81571 Reconsider stable sort in sorter.cpp Blocked
is related to SERVER-19766 Initial sync index build is sequentia... Closed
Assigned Teams:
Storage Execution
Sprint: Execution EMEA Team 2023-10-02
Participants:
Case:

 Description   

It would be nice if the external sort for creating an index used multiple cores.



 Comments   
Comment by Adrien Jarthon [ 07/Dec/23 ]

Thanks!

Comment by Steven Vannelli [ 06/Dec/23 ]

Hi bigbourin@gmail.com, you can follow SERVER-83953. Once the work is done, the team will update that ticket.  

Comment by Adrien Jarthon [ 20/Nov/23 ]

steven.vannelli@mongodb.com thanks for the update, is there any ticket were we can follow this index build parallelism feature?

Comment by Steven Vannelli [ 20/Nov/23 ]

We found some small improvements for index builds but our future plans for index builds use parallelism for the entire process and not just for the sorting phase.

Comment by Jordi Olivares Provencio [ 13/Oct/23 ]

Putting this back in the backlog as we identified a few ways of improving the throughput of index builds:

Both of these tickets combined yielded significant improvements without resorting to complete refactors of the index build architecture.

Comment by Connie Chen [ 13/Sep/21 ]

We can consider this during initial sync, repair, or any other operation that is known to be single-threaded. 

Comment by Piyush Katariya [ 02/Jul/20 ]

This issue need re-consideration. Also in addition it will be helpful if re-building it does not acquire lock on the collection.

Comment by oleg gritsak [ 30/Mar/19 ]

So sad to see this feature request in low priority queue for almost a decade.

 

Speed is the key feature of Mongo for me, and now it is even more important to match opponents. PostgreSQL recently implemented MT-indexing. Oracle can do it for years.

Had a task to import 20 billions (20.000 millions) of short documents in Mongo and it failed miserable. Batch insert speed is impressive - more than a million of inserts/sec. But creation of index on 2TB collection is going to last forever...

Comment by Roy Reznik [ 07/Nov/16 ]

Thanks Eric,
It might be useful for some usecases, however, sounds to me like sorting the keys during the data copy phase does not change the fact that it will be CPU bound in that phase instead of the index build phase.
If the issue I was referring to was being IO bound, I can understand how passing through the data only once improves that.
Regarding the multiple collections, that makes sense, but I still think that for many use-cases like a small # of large collections - it still won't be good enough.

Comment by Eric Milkie [ 07/Nov/16 ]

Roy, I believe that other initial sync improvements will have a bigger impact. Some of these improvements are already implemented for the 3.4 release – we now sort all the index keys for a collection during the data copy phase, for example, which avoids multiple passes through the data. Eventually, I would like to see multiple collections cloning simultaneously, which would permit multiple index builds running on multiple cores.

Comment by Roy Reznik [ 06/Nov/16 ]

Not sure why this issue is marked as "minor" - it has a huge impact when doing initial sync, which is very slow and single CPU bound in the index build phase...

Generated at Thu Feb 08 02:54:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.