Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Incomplete
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.6.6
Component/s: Stability
Labels:
None

Operating System:
ALL
Steps To Reproduce:

Hide

Try to update an existing record that is very close to 16MB with some more data, brining it over the 16MB mark, using a single user update

Show
Try to update an existing record that is very close to 16MB with some more data, brining it over the 16MB mark, using a single user update
Sprint:
Sharding 2019-09-09, Sharding 2019-09-23, Sharding 2019-10-07, Sharding 2019-12-02, Sharding 2019-12-16, Sharding 2019-12-30, Sharding 2020-02-10, Sharding 2020-02-24
Case:
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I'm not certain if this would happen every time, but it did happen to us in production.

We had an object that was very close to 16MB (15.99MB according to bsonsize()), and our application went to update the record with a little more data.

The mongos that was being used then crashed with the following message:

2019-08-11T08:10:25.814+0000 F ASIO     [NetworkInterfaceASIO-TaskExecutorPool-2-0] Uncaught exception in NetworkInterfaceASIO IO worker thread of type: Location10334: BSONObj size: 16794106 (0x10041FA) is invalid. Size must be between 0 and 16793600(16MB) First element: update: "<COLLECTION_NAME>"

FYI In the above and the full crash logs, the collection name is redacted to "<COLLECTION_NAME>".

Then our application, which tries to re-write this data periodically if the initial write fails, tried to write it a little later, and went to a different mongos server, which also crashed. This caused our cluster to be effectively unavailable since both mongos nodes had crashed.

I've attached both stack traces.

Obviously we don't want to be running with DB objects at or close to 16MB, so we fixed the object in question to not be as big, but even though this isn't something we have happening all the time, it does happen occasionally and we expect to need to run our production servers with the ability for 16MB objects to gracefully fail to save in the future.

Our version is technically 3.6.6-evg1, which is a custom build we have branched directly off of 3.6.6, which you can find here https://github.com/evergage/mongo/commits/v3.6.6-evg1. The only difference is the last 3 commits you see there which just quiets some extra verbose metadata logging that was eating basically infinite log entries and we had to silence in order to run this in production. Since the changes are so minor, hopefully that means that the stack trace line numbers and such are still usable for you. Since then that bug (https://jira.mongodb.org/browse/SERVER-30841?filter=21888) has been fixed in 3.6.8, and assuming that it silenced all the things we silenced in our custom build (3 different files), then we might be able to get off of running a custom build in the future.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

mongos_crash_1st.txt
4 kB
Aug 23 2019 07:55:36 PM UTC
mongos_crash_2nd.txt
8 kB
Aug 23 2019 07:55:36 PM UTC
mongos_crash_3rd.txt
4 kB
Nov 02 2019 02:27:07 PM UTC

is duplicated by

SERVER-44345 MongoS crash with "BufBuilder attempted to grow()" above 64MB while restarting/upgrading a secondary from 3.4 to 3.6

Closed

related to

SERVER-29109 Client metadata log message verbosity is not parallel with connection start/end messages

Closed

SERVER-27663 Informational Network component log messages should be configurable

Open

SERVER-30841 Lower the amount of metadata refresh logging

Closed

Assignee:: Matthew Saltz (Inactive)
Reporter:: Scott Glajch
Participants:: Danny Hatcher, Kevin Pulo, Matthew Saltz, Scott Glajch
Votes:: 1 Vote for this issue
Watchers:: 17 Start watching this issue

Created:: Aug 23 2019 07:57:39 PM UTC
Updated:: Dec 27 2020 03:41:36 PM UTC
Resolved:: Feb 18 2020 06:21:10 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

PagerDuty