[SERVER-23107] Make mongo shell split bulk ops when size exceeds 16MB Created: 14/Mar/16 Updated: 06/Dec/22 Resolved: 03/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Shell |
| Affects Version/s: | 3.0.10, 3.2.4, 3.3.2 |
| Fix Version/s: | features we're not sure of |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Eric Sommer | Assignee: | Backlog - Server Tooling and Methods (STM) (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | move-sa, move-stm, platforms-re-triaged | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Server Tooling & Methods
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
According to this driver spec "[m]ore than 16MB worth of inserts are split into multiple messages, and error indexes are rewritten." This is in addition to splitting bulk ops that contain > 1000 operations into multiple ops. The mongo shell does split bulk ops that contain > 1000 operations into 1000-op batches. But when the bulk op exceeds 16MB, it issues an error:
The python and C drivers, on the other hand, work as expected. They split bulk ops that either contain > 1000 ops or are > 16 MB into smaller batches. |
| Comments |
| Comment by Brooke Miller [ 03/Dec/21 ] | |||||||||||||||||||
|
We've deprecated the mongo shell in favor of the new mongosh. Unfortunately, we aren't able to pursue improvements to the deprecated shell except in extreme cases, such as critical security fixes. Please start making use of mongosh and let us know if it works for you in this case. | |||||||||||||||||||
| Comment by Jonathan Reams [ 15/Nov/16 ] | |||||||||||||||||||
|
I've looked into this a bit, and although it is technically fixable by moving where we calculate the current batch size to where the command is actually being generated, I think it'd make this interface so inefficient that using the bulk api would not really be useful. I was able to get the attached bulk.js to run to completion by making these changes, but the runtime was almost six minutes. Right now the way we don't really generate BSON in the shell, we generate JS objects and ask our C++ code to serialize them to BSON before putting them on the wire. To track the size of the growing batch, we need to constantly call Object.bsonsize - which fully serializes the object into BSON and gets its size - either on each object added to the batch (what we do now) or to the command as its being built. This means a ton of copying/converting between javascript and the C++ BSON library, almost all of which gets thrown away immediately since we only wanted to know the size as a side-effect. | |||||||||||||||||||
| Comment by Max Hirschhorn [ 14/Mar/16 ] | |||||||||||||||||||
|
The mongo shell has code in its bulk API implementation to split batches of the same operation type when they would exceed either 16MB or 1000 operations.
The problem is that the command (request) resulting from buildBatchCmd(batch) exceeds 16MB and cannot be serialized through BSON when attempting to call the native C++ function that underlies Mongo.prototype.find(). |