There are a couple different ways to make this parallelized:
- On the server, use the multi-index builder to build them in parallel. As of 2.6, the createIndexes command can take >1 index at a time. As of 2.8, it also creates them in parallel, so using a single createIndexes for all indexes for a collection should make index builds much more efficient.
- On the client:
- Open one connection per collection (up to a user-specified or adaptively arrived at limit)
- Use >1 thread per collection. Read ahead from the source file and make multiple connections to the server
- Throttling the rate from the client to server so as not to overload either client or server.
- When using >1 thread per collection, data will not be restored in the same order it was dumped.
I am running mongorestore to recreate a copy of large(ish) production database on a separate system (~300GB). It seems from observation that the process of importing the data and re-creating the indexes is happening in serial. Given that indexes can be created in the background during normal operating conditions, that at least this bit could be done in parallel. Ideally it would be fantastic to see the collections themselves be restored in parallel since the machine(s) I'm working with have plenty of extra resources to spare for this process. Is this doable? Or perhaps there are complexities that prevent this which I am not aware of?
Thanks, as always.