[DOCS-12735] Docs for TOOLS-1956: Add Bulk Upsert and increase batch size limit Created: 21/May/19  Updated: 13/Nov/23  Resolved: 28/Aug/19

Status: Closed
Project: Documentation
Component/s: manual
Affects Version/s: None
Fix Version/s: 4.1.12, Server_Docs_20231030, Server_Docs_20231106, Server_Docs_20231105, Server_Docs_20231113

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Kay Kim (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
documents TOOLS-1956 Add Bulk Upsert and increase batch si... Closed
Participants:
Days since reply: 4 years, 24 weeks ago
Epic Link: DOCS: 4.2 Server/Tools

 Description   

Description

Description:

Options documentation would need to be updated

Engineering Ticket Description:

Revised

As we convert the tools to the new Go driver, the original PR will not apply. Instead, we'll implement higher-performance bulk insert/update built on the new Go driver bulk API, including the higher batch size limit.

The request to add a "Remove" mode has been pulled out to TOOLS-2268 for separate triage

Original

The below changes were implemented after consulting with our Mongo rep Anant Srivastava to meet internal implementation needs. I will be opening a pull request shortly with our changes for review in case some/all of these changes want to be rolled into the product.

Bulk upserts

Enable bulk upsert operations. In the live version of mongoimport, running in upsert mode limits to 1 insertion worker process and an effective batch size of 1. This results in performance that unfortunately rendered mongoimport not viable for our volumes. With the addition of bulk, multi-worker upserts, we are seeing a 400-700X performance boost. With this performance tweak, mongoimport became a viable tool for our update process.

--bulkUpdate command line option added. When toggled on, upserts can be executed in bulk and in multiple worker processes. This option was added to limit the impact to existing processes using mongoimport. There is some debate on whether this flag is necessary or if 'bulkUpdate' mode should be 'on' by default and toggled 'off' via the --maintainInsertionOrder option

The change for 'bulkUpdate' upsert mode was implemented through disabling maintainInsertionOrder, removing the restriction for 1 insertion worker and adding new method to BufferedBulkInserter to support bulk Upsert operations.

Remove mode

--mode remove option added. Will construct bson selectors using records from input file and --upsertFields to remove matching documents. Each selector will remove only a single matching document. Implemented through adding new method to BufferedBulkInserter to support bulk Remove operations.

--upsertFields are required when specifying this option.

batchSize limit increased from 1k to 100k

With the MongoDB 3.6 batch size limit changes, the --batchSize option's maximum was raised to 100k documents. Mongoimport and mongo driver code (gopkg.in/mgo.v2) were patched to support this. Specifying a batch size larger than 1000 and targeting MongoDB <3.6 results in operations being batched driver side in chunks of 1000. The driver was also patched to split write operations >16MB into separate writeOpCommand calls for *insertOp, bulkUpdateOp, and bulkDeleteOp operation types.

https://docs.mongodb.com/manual/reference/limits/#Write-Command-Batch-Limit-Size

Scope of changes

Impact to Other Docs

MVP (Work and Date)

Resources (Scope or Design Docs, Invision, etc.)



 Comments   
Comment by Githook User [ 28/Aug/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12735: 4.2 mongimport bulk insert/upserts(pt2)
Branch: master
https://github.com/mongodb/docs/commit/5bfb47c9295e916cdb68a157c743fbe74aed99aa

Comment by Githook User [ 28/Aug/19 ]

Author:

{'name': 'Kay Kim', 'username': 'kay-kim', 'email': 'kay.kim@10gen.com'}

Message: DOCS-12735: 4.2 mongimport bulk insert/upserts
Branch: master
https://github.com/mongodb/docs/commit/49eebd9d0c36b6b0dfd48f43f3f70655e1cbe032

Generated at Thu Feb 08 08:05:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.