[SERVER-16206] Use the WT bulk loader for collection documents in cloner Created: 17/Nov/14  Updated: 27/Apr/21  Resolved: 26/Apr/21

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Daniel Pasette (Inactive) Assignee: Benety Goh
Resolution: Done Votes: 0
Labels: newgrad
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-16087 Add support for bulk loading (with Wi... Open
is related to SERVER-23059 Collection and Database Cloner: Imple... Closed
is related to SERVER-33586 create a cloneUnshardedCollections co... Closed
is related to SERVER-41530 For uncapped collections, CollectionB... Closed
is related to SERVER-41801 [POC] Make Initial sync collection cl... Closed
Sprint: Execution Team 2021-05-03
Participants:

 Description   

We added using the bulk loader for for foreground index builds in WT with SERVER-16199. This ticket is for allowing the bulk loader for the collection documents as well.



 Comments   
Comment by Eric Milkie [ 27/Apr/21 ]

We are typically bulk loading unlogged tables anyway, so that facet of bulk loading doesn't buy us much. It's actually just the CPU savings (from avoiding the transaction machinery) and the memory savings (because bulk loaded items get written more-or-less directly to disk using a dedicated buffer instead of consuming WT cache space).

Comment by Mathias Stearn [ 27/Apr/21 ]

One difference with using the bulk loader is that it bypasses logging/journaling because the assumption is that if you crash, you can just blow away the table and reload it. I imagine this could be useful for servers that are limited by disk write throughput. Has there been any work to achieve this for initial sync w/o bulk cursors? Should I file a separate ticket for that?

Comment by Connie Chen [ 26/Apr/21 ]

We're closing this as "gone away." There has already been a significant investigation in the performance report referenced in the above comment, which shows limited gains. We also agree that the File Copy Based Initial Sync Project will further diminish any value add for this ticket.  

Comment by Benety Goh [ 26/Apr/21 ]

SERVER-41801 documents the work for a WiredTiger Bulk Cursor POC and its impact on initial sync performance.

Comment by Benety Goh [ 26/Apr/21 ]

Since 3.4, the CollectionBulkLoader is responsible for populating a collection during initial sync with documents from the sync source. See SERVER-23059.

In 3.4, we still use one WUOW to insert every document in the target collection.

In 4.4, SERVER-41530 improved how CollectionBulkLoader used WT transactions (WriteUnitOfWork) for collection inserts so that we would batch multiple (untimestamped) collection inserts under a single WriteUnitOfWork.

Comment by Benety Goh [ 26/Apr/21 ]

The Cloner, which still inserts documents into the target collection one-by-one (one WUOW per document), is no longer used for initial sync as of 3.4. Some sharding operations still use this class for catalog operations, presumably for much small collections. See SERVER-33586.

Comment by Connie Chen [ 04/Jan/21 ]

With the move towards file-based initial sync, we feel this optimization is no longer worth doing. 

Generated at Thu Feb 08 03:40:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.