Uploaded image for project: 'MongoDB Database Tools'
  1. MongoDB Database Tools
  2. TOOLS-1956

Add Bulk Upsert and increase batch size limit

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.1.12
    • Component/s: mongoimport
    • Labels:
      None
    • Documentation Changes:
      Needed
    • Documentation Changes Summary:
      Options documentation would need to be updated
    • Case:

      Description

      Revised

      As we convert the tools to the new Go driver, the original PR will not apply. Instead, we'll implement higher-performance bulk insert/update built on the new Go driver bulk API, including the higher batch size limit.

      The request to add a "Remove" mode has been pulled out to TOOLS-2268 for separate triage

      Original

      The below changes were implemented after consulting with our Mongo rep Anant Srivastava to meet internal implementation needs. I will be opening a pull request shortly with our changes for review in case some/all of these changes want to be rolled into the product.

      Bulk upserts

      Enable bulk upsert operations. In the live version of mongoimport, running in upsert mode limits to 1 insertion worker process and an effective batch size of 1. This results in performance that unfortunately rendered mongoimport not viable for our volumes. With the addition of bulk, multi-worker upserts, we are seeing a 400-700X performance boost. With this performance tweak, mongoimport became a viable tool for our update process.

      --bulkUpdate command line option added. When toggled on, upserts can be executed in bulk and in multiple worker processes. This option was added to limit the impact to existing processes using mongoimport. There is some debate on whether this flag is necessary or if 'bulkUpdate' mode should be 'on' by default and toggled 'off' via the --maintainInsertionOrder option

      The change for 'bulkUpdate' upsert mode was implemented through disabling maintainInsertionOrder, removing the restriction for 1 insertion worker and adding new method to BufferedBulkInserter to support bulk Upsert operations.

      Remove mode

      --mode remove option added. Will construct bson selectors using records from input file and --upsertFields to remove matching documents. Each selector will remove only a single matching document. Implemented through adding new method to BufferedBulkInserter to support bulk Remove operations.

      --upsertFields are required when specifying this option.

      batchSize limit increased from 1k to 100k

      With the MongoDB 3.6 batch size limit changes, the --batchSize option's maximum was raised to 100k documents. Mongoimport and mongo driver code (gopkg.in/mgo.v2) were patched to support this. Specifying a batch size larger than 1000 and targeting MongoDB <3.6 results in operations being batched driver side in chunks of 1000. The driver was also patched to split write operations >16MB into separate writeOpCommand calls for *insertOp, bulkUpdateOp, and bulkDeleteOp operation types.

      https://docs.mongodb.com/manual/reference/limits/#Write-Command-Batch-Limit-Size

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              divjot.arora Divjot Arora
              Reporter:
              caleb.hankins@acxiom.com Caleb Hankins
              Votes:
              4 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: