[SERVER-44345] MongoS crash with "BufBuilder attempted to grow()" above 64MB while restarting/upgrading a secondary from 3.4 to 3.6 Created: 31/Oct/19  Updated: 11/Dec/19  Resolved: 05/Nov/19

Status: Closed
Project: Core Server
Component/s: Upgrade/Downgrade
Affects Version/s: 3.4.17, 3.6.14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Scott Glajch Assignee: Danny Hatcher (Inactive)
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File shard9_secondary1_crash_10_31_2019.txt    
Issue Links:
Duplicate
duplicates SERVER-43021 MongoS server crashes when attempt to... Closed
Operating System: ALL
Steps To Reproduce:

(assuming the upgrade was related)
1.  Have a 3.4 cluster

2. Upgrade the config servers

3. Start upgrading the replica sets

4. Eventually one of them secondary restarts on the replica sets triggers this exception.

Participants:

 Description   

We are in the middle of a 3.4.17-evg1 to 3.6.14 upgrade, when one of the mongos servers crashed.

The most specific message line is this:

Assertion: 13548:BufBuilder attempted to grow() to 1751919127 bytes, past the 64MB limit. src/mongo/bson/util/builder.h 326

 

It happened right after a log line where one of our nodes (in the trace attached I've renamed it to <SHARD9_SECONDARY1>) was just starting to shut down as a part of the upgrade process.

 

This happened less than an hour before I hit submit on this report, so if there are any transient logs our debug output you want me to provide, let me know!

 

FYI For 3.4.17-evg1, the "-evg1" is just our custom build patched version with 3 logging changes described in the description of this bug https://jira.mongodb.org/browse/SERVER-43021

 

Note that a few months ago, our 3.6 cluster (we have a different, less high-impact cluster we already have at 3.6) had an issue where something tried to write more than 16MB, and it crahsed all of our mongoS servers in succession.  That bug is herehttps://jira.mongodb.org/browse/SERVER-43021 just in case it's helpful.  We never resolved that bug, but we also never saw the issue again (luckily). 



 Comments   
Comment by Githook User [ 11/Dec/19 ]

Author:

{'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}

Message: SERVER-44345 stack trace all threads
Branch: master
https://github.com/mongodb/mongo/commit/4f0b6507d4acd185b4c925c176a5693be339f364

(typo: that git commit should have said SERVER-33445)

Comment by Danny Hatcher (Inactive) [ 05/Nov/19 ]

I believe this is being worked on in SERVER-43021 so I'm going to close this ticket as a duplicate. Please let me know if this is actually a different issue.

Comment by Scott Glajch [ 01/Nov/19 ]

This happened again today, with the same looking stack trace.  We are still in the process of restarting our mongod nodes and again it coincided with one of them coming down for upgrade.

Generated at Thu Feb 08 05:05:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.