[SERVER-16206] Use the WT bulk loader for collection documents in cloner Created: 17/Nov/14 Updated: 27/Apr/21 Resolved: 26/Apr/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Daniel Pasette (Inactive) | Assignee: | Benety Goh |
| Resolution: | Done | Votes: | 0 |
| Labels: | newgrad | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Sprint: | Execution Team 2021-05-03 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
We added using the bulk loader for for foreground index builds in WT with |
| Comments |
| Comment by Eric Milkie [ 27/Apr/21 ] |
|
We are typically bulk loading unlogged tables anyway, so that facet of bulk loading doesn't buy us much. It's actually just the CPU savings (from avoiding the transaction machinery) and the memory savings (because bulk loaded items get written more-or-less directly to disk using a dedicated buffer instead of consuming WT cache space). |
| Comment by Mathias Stearn [ 27/Apr/21 ] |
|
One difference with using the bulk loader is that it bypasses logging/journaling because the assumption is that if you crash, you can just blow away the table and reload it. I imagine this could be useful for servers that are limited by disk write throughput. Has there been any work to achieve this for initial sync w/o bulk cursors? Should I file a separate ticket for that? |
| Comment by Connie Chen [ 26/Apr/21 ] |
|
We're closing this as "gone away." There has already been a significant investigation in the performance report referenced in the above comment, which shows limited gains. We also agree that the File Copy Based Initial Sync Project will further diminish any value add for this ticket. |
| Comment by Benety Goh [ 26/Apr/21 ] |
|
|
| Comment by Benety Goh [ 26/Apr/21 ] |
|
Since 3.4, the CollectionBulkLoader is responsible for populating a collection during initial sync with documents from the sync source. See In 3.4, we still use one WUOW to insert every document in the target collection. In 4.4, |
| Comment by Benety Goh [ 26/Apr/21 ] |
|
The Cloner, which still inserts documents into the target collection one-by-one (one WUOW per document), is no longer used for initial sync as of 3.4. Some sharding operations still use this class for catalog operations, presumably for much small collections. See |
| Comment by Connie Chen [ 04/Jan/21 ] |
|
With the move towards file-based initial sync, we feel this optimization is no longer worth doing. |