[SERVER-23107] Make mongo shell split bulk ops when size exceeds 16MB Created: 14/Mar/16  Updated: 06/Dec/22  Resolved: 03/Dec/21

Status: Closed
Project: Core Server
Component/s: Shell
Affects Version/s: 3.0.10, 3.2.4, 3.3.2
Fix Version/s: features we're not sure of

Type: Improvement Priority: Major - P3
Reporter: Eric Sommer Assignee: Backlog - Server Tooling and Methods (STM) (Inactive)
Resolution: Won't Fix Votes: 0
Labels: move-sa, move-stm, platforms-re-triaged
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File bulk.js     File bulk.py    
Issue Links:
Duplicate
duplicates SERVER-23132 shell does not split Bulk operations ... Closed
Related
Assigned Teams:
Server Tooling & Methods
Participants:

 Description   

According to this driver spec "[m]ore than 16MB worth of inserts are split into multiple messages, and error indexes are rewritten." This is in addition to splitting bulk ops that contain > 1000 operations into multiple ops.

The mongo shell does split bulk ops that contain > 1000 operations into 1000-op batches. But when the bulk op exceeds 16MB, it issues an error:

2016-03-14T11:09:57.077+0200 E QUERY    [thread1] Error: Converting from JavaScript to BSON failed: Object size 16795903 exceeds limit of 16793600 bytes. :
DBQuery.prototype._exec@src/mongo/shell/query.js:112:28
DBQuery.prototype.next@src/mongo/shell/query.js:283:5
Bulk/executeBatch@src/mongo/shell/bulk_api.js:853:16
Bulk/this.execute@src/mongo/shell/bulk_api.js:1139:11
@bulk.js:31:1

The python and C drivers, on the other hand, work as expected. They split bulk ops that either contain > 1000 ops or are > 16 MB into smaller batches.



 Comments   
Comment by Brooke Miller [ 03/Dec/21 ]

We've deprecated the mongo shell in favor of the new mongosh. Unfortunately, we aren't able to pursue improvements to the deprecated shell except in extreme cases, such as critical security fixes. Please start making use of mongosh and let us know if it works for you in this case.

Comment by Jonathan Reams [ 15/Nov/16 ]

I've looked into this a bit, and although it is technically fixable by moving where we calculate the current batch size to where the command is actually being generated, I think it'd make this interface so inefficient that using the bulk api would not really be useful. I was able to get the attached bulk.js to run to completion by making these changes, but the runtime was almost six minutes.

Right now the way we don't really generate BSON in the shell, we generate JS objects and ask our C++ code to serialize them to BSON before putting them on the wire. To track the size of the growing batch, we need to constantly call Object.bsonsize - which fully serializes the object into BSON and gets its size - either on each object added to the batch (what we do now) or to the command as its being built. This means a ton of copying/converting between javascript and the C++ BSON library, almost all of which gets thrown away immediately since we only wanted to know the size as a side-effect.

Comment by Max Hirschhorn [ 14/Mar/16 ]

The mongo shell has code in its bulk API implementation to split batches of the same operation type when they would exceed either 16MB or 1000 operations.

// Set max byte size
var maxBatchSizeBytes = 1024 * 1024 * 16;
var maxNumberOfDocsInBatch = 1000;
 
...
 
var addToOperationsList = function(docType, document) {
    ...
 
    // Finalize and create a new batch if this op would take us over the
    // limits *or* if this op is of a different type
    if (currentBatchSize + 1 > maxNumberOfDocsInBatch ||
        (currentBatchSize > 0 && currentBatchSizeBytes + bsonSize >= maxBatchSizeBytes) ||
        currentBatch.batchType != docType) {
        finalizeBatch(docType);
    }
 
    ...
}

The problem is that the command (request) resulting from buildBatchCmd(batch) exceeds 16MB and cannot be serialized through BSON when attempting to call the native C++ function that underlies Mongo.prototype.find().

Generated at Thu Feb 08 04:02:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.