[SERVER-32372] Mongos crashes on bulk insert where size is slightly bigger than maxBsonObjectSize Created: 15/Dec/17 Updated: 30/Oct/23 Resolved: 18/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.0 |
| Fix Version/s: | 3.6.1, 3.7.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Yuriy [X] | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | SWNA | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v3.6
|
||||
| Steps To Reproduce: | 1. Mongo sharded cluster 3.6.0 with 2 replicas. |
||||
| Sprint: | Sharding 2018-01-01 | ||||
| Participants: | |||||
| Case: | (copied to CRM) | ||||
| Description |
|
We upgraded mongo sharded cluster from 3.4.10 to 3.6.0 version and mongoses started to crash from insertMany operation, if its size is bigger than maxBsonObjectSize.
|
| Comments |
| Comment by Scott Glajch [ 23/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks Kaloian, yes it was a single update operation, and I don't have the full exact object being updated, but I have the current version of the object that we've trimmed. I can confirm that the _id was not long. I've logged a new bug here https://jira.mongodb.org/browse/SERVER-43021 | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 12/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi glajchs, I can't see any new attachments, but since this ticket has already been released, would it be possible to create a new one and include the crash stack from the update operation? Would it be possible to also include the update operation, which you performed? Specifically - was it update to a single document or multiple? Because update to a single document, no matter how large should only be returning the _id of that document and should not be able to exceed the response size (unless the size of the _id is extremely large). Best regards, | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Scott Glajch [ 12/Aug/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
We have also just seen this (or something like this) on mongo version 3.6.6. The only thing that seems different is the fact that I'm pretty sure we weren't doing a bulk insert, but rather an update. What happened was we had an object that was approaching 16MB (when I checked the existing DB object on disk it was 15.99MB), and when we went to update it with just a bit more data, it crashed our first mongos server. Then when the cache writer in our application retried the update 45 seconds later, it crashed our second mongos server. I've attached both stack traces. Obviously we don't want to be running with DB objects at or close to 16MB, so we fixed the object in question to not be as big, but even though this isn't something we have happening all the time, it does happen occasionally and we expect to need to run our production servers with the ability for 16MB objects to gracefully fail to save in the future. Our version is technically 3.6.6-evg1, which is a custom build we have branched directly off of 3.6.6, which you can find here https://github.com/evergage/mongo/commits/v3.6.6-evg1. The only difference is the last 3 commits you see there which just quiets some extra verbose metadata logging that was eating basically infinite log entries and we had to silence in order to run this in production. Since the changes are so minor, hopefully that means that the stack trace line numbers and such are still usable for you. Since then that bug (https://jira.mongodb.org/browse/SERVER-30841?filter=21888) has been fixed in 3.6.8, and assuming that it silenced all the things we silenced in our custom build (3 different files), then we might be able to get off of running a custom build in the future.
Please let me know if you'd rather we log a new bug for this if you deem it to be a separate issue, though I'm not sure we'll be able to reproduce the issue again (at least not at will). | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 11/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi randar, Based on when this issue was fixed (3.6.1) we do not expect it to be present in 3.6.5. If you think you are experiencing similar crash in 3.5.6 can you please file a new SERVER ticket, specify the exact version, attach the mongos logs and (if possible) the repro steps you used? Thanks in advance. Best regards, | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Randar Puust [ 11/Jan/19 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
We are currently running Mongo version v3.6.5. It feels like we are having this exact same issue. I'm not clear if there was an attempt to fix it in 3.6.1 and it wasn't correctly fixed, or did it regress in 3.7.0 and finally fixed in 3.7.1. It's having a fairly big affect on us and we are trying to figure out if we should escalate an upgrade to Mongo 4.0. Do you think this issue was still there in 3.6.5? | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yuriy [X] [ 19/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Tested it from sources. All is ok! Thanks! | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 18/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: (cherry picked from commit 85e1ed33ef2fc83e870124441eee7e036b8118a4) | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 18/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}Message: | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Kaloian Manassiev [ 18/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Ubus, Thank you very much for your report and for your detailed investigation. We have figured out the cause for this problem and it is due to incorrect calculation of the resulting per-shard request size, which causes the maximum BSON size of 16MB to be exceeded. We have a fix out for code review. In the mean time in order to get unblocked, you can lower the size of the batches you are using. Best regards, | |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yuriy [X] [ 18/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Just tested it on 3.7.0-39-g4edcb81 mongos. Still same crash.
| |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ramon Fernandez Marina [ 16/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the detailed report Ubus. The stack traces you provided look as follows after running them through the symbolizer:
Not much there. The second one has more information:
| |||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Yuriy [X] [ 15/Dec/17 ] | |||||||||||||||||||||||||||||||||||||||||||||||
|
Ops, stacktraces in description got badly formatted shape. |