[SERVER-55459] MongoD segmentation fault when connections increase Created: 23/Mar/21  Updated: 27/Oct/23  Resolved: 13/Apr/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.7, 4.2.11
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Wei Wu Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File FireShot Capture 031 - Prometheus Time Series Collection and Processing Server_ - prod-prometheus-collector-01.nb-prod.com.png     PNG File Screen Shot 2021-04-06 at 16.16.39.png     Text File mongo-crash.log    
Operating System: ALL
Participants:

 Description   

We have experienced MongoD segmentation fault when connections increase.

Let me know if you need to know more info.

MongoDB Setup

We have 5 nodes replicaset. 1 Primary and 4 Secondary nodes

OS Version

AWS c5d.9xlarge running Ubuntu 16.04 
Kernel Version 
Linux ip-172-31-30-60 4.4.0-1066-aws #76-Ubuntu SMP Thu Aug 16 16:21:21 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

 

MongoDB version

 

mongo -version MongoDB shell version v4.2.11 
git version: ea38428f0c6742c7c2c7f677e73d79e17a2aab96 
OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016 
allocator: tcmalloc 
modules: none 
build environment: 
    distmod: ubuntu1604 
    distarch: x86_64 
    target_arch: x86_64

 

Crash log

 

2021-03-23T17:40:17.419+0000 F - [listener] Got signal: 11 (Segmentation fault). 0x55e925f81ae1 0x55e925f8110c 0x55e925f812f0 0x7f4d09783390 0x7f4d09779e8f 0x55e925cf29e4 0x55e925696e41 0x55e92480f8cd 0x55e9248100b3 0x55e92480cd4e 0x55e92569f7eb 0x55e9259432a4 0x55e925943535 0x55e92594b1be 0x55e9256a04ee 0x55e9256a0a0e 0x55e9260a906f 0x7f4d097796ba 0x7f4d094af41d ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"55E92366C000","o":"2915AE1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55E92366C000","o":"291510C"},{"b":"55E92366C000","o":"29152F0"},{"b":"7F4D09772000","o":"11390"},{"b":"7F4D09772000","o":"7E8F","s":"pthread_create"},{"b":"55E92366C000","o":"26869E4","s":"_ZN5mongo25launchServiceWorkerThreadESt8functionIFvvEE"},{"b":"55E92366C000","o":"202AE41","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},{"b":"55E92366C000","o":"11A38CD","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},{"b":"55E92366C000","o":"11A40B3","s":"_ZN5mongo19ServiceStateMachine5startENS0_9OwnershipE"},{"b":"55E92366C000","o":"11A0D4E","s":"_ZN5mongo21ServiceEntryPointImpl12startSessionESt10shared_ptrINS_9transport7SessionEE"},{"b":"55E92366C000","o":"20337EB"},{"b":"55E92366C000","o":"22D72A4","s":"_ZN4asio6detail9scheduler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"55E92366C000","o":"22D7535","s":"_ZN4asio6detail9scheduler3runERSt10error_code"},{"b":"55E92366C000","o":"22DF1BE","s":"_ZN4asio10io_context3runEv"},{"b":"55E92366C000","o":"20344EE","s":"_ZN5mongo9transport18TransportLayerASIO12_runListenerEv"},{"b":"55E92366C000","o":"2034A0E"},{"b":"55E92366C000","o":"2A3D06F"},{"b":"7F4D09772000","o":"76BA"},{"b":"7F4D093A8000","o":"10741D","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.2.11", "gitVersion" : "ea38428f0c6742c7c2c7f677e73d79e17a2aab96", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-1066-aws", "version" : "#76-Ubuntu SMP Thu Aug 16 16:21:21 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "55E92366C000", "elfType" : 3, "buildId" : "AA537F1F16E5F7B5DBAA77FE3401D47A94DB4FBA" }, { "b" : "7FFEE4473000", "elfType" : 3, "buildId" : "01F3399A0C2BABF67E7AD13494046BD3C5B72B33" }, { "b" : "7F4D0AB82000", "path" : "/usr/lib/x86_64-linux-gnu/libcurl.so.4", "elfType" : 3, "buildId" : "D35D0419C3448F74EDCD9E9415B6052858F7F458" }, { "b" : "7F4D0A967000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "6EF73266978476EF9F2FD2CF31E57F4597CB74F8" }, { "b" : "7F4D0A523000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "250E875F74377DFC74DE48BF80CCB237BB4EFF1D" }, { "b" : "7F4D0A2BA000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "513282AC7EB386E2C0133FD9E1B6B8A0F38B047D" }, { "b" : "7F4D0A0B6000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "8CC8D0D119B142D839800BFF71FB71E73AEA7BD4" }, { "b" : "7F4D09EAE000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "89C34D7A182387D76D5CDA1F7718F5D58824DFB3" }, { "b" : "7F4D09BA5000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "DFB85DE42DAFFD09640C8FE377D572DE3E168920" }, { "b" : "7F4D0998F000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F4D09772000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "CE17E023542265FC11D9BC8F534BB4F070493D30" }, { "b" : "7F4D093A8000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B5381A457906D279073822A5CEB24C4BFEF94DDB" }, { "b" : "7F4D0ADF1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5D7B6259552275A3C17BD4C3FD05F5A6BF40CAA5" }, { "b" : "7F4D09175000", "path" : "/usr/lib/x86_64-linux-gnu/libidn.so.11", "elfType" : 3, "buildId" : "E09D3783AD1D0BBCD3204FA01E4EF6D756E18F57" }, { "b" : "7F4D08F59000", "path" : "/usr/lib/x86_64-linux-gnu/librtmp.so.1", "elfType" : 3, "buildId" : "8D1CC1204D6B6D33BD1D2C5A2A0516A2234322CF" }, { "b" : "7F4D08D0F000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DB5180B568097E2A4690A5B40D36BD134C893FEE" }, { "b" : "7F4D08B00000", "path" : "/usr/lib/x86_64-linux-gnu/liblber-2.4.so.2", "elfType" : 3, "buildId" : "D3B183C41F02C7CD18F906AAFD19C69C850F1CEB" }, { "b" : "7F4D088AF000", "path" : "/usr/lib/x86_64-linux-gnu/libldap_r-2.4.so.2", "elfType" : 3, "buildId" : "DA0EC53E16B3AC6BDC56EAEFE1BFECDDC395FB2E" }, { "b" : "7F4D08695000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "8D9BD4CE26E45EF16075C67D5F5EEAFD8B562832" }, { "b" : "7F4D08365000", "path" : "/usr/lib/x86_64-linux-gnu/libgnutls.so.30", "elfType" : 3, "buildId" : "3CE893F6D1382C2C7648DCCB06E71B1C7E0861CC" }, { "b" : "7F4D08132000", "path" : "/usr/lib/x86_64-linux-gnu/libhogweed.so.4", "elfType" : 3, "buildId" : "B11678F560199547DCF726384EA39153EE0DFABF" }, { "b" : "7F4D07EFC000", "path" : "/usr/lib/x86_64-linux-gnu/libnettle.so.6", "elfType" : 3, "buildId" : "D6B36C5A463EE0FA84FDD6D5FD3F7726EDB90D54" }, { "b" : "7F4D07C7C000", "path" : "/usr/lib/x86_64-linux-gnu/libgmp.so.10", "elfType" : 3, "buildId" : "7B3533D5998D20EE1A1BE3F87789B69041E7F620" }, { "b" : "7F4D079AA000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "16E3DBC6D048145939BB43BBFD7954D27421B00F" }, { "b" : "7F4D0777B000", "path" : "/usr/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "AEB4C08FC47F86C475E9D3996DFE5E9B403ACEBF" }, { "b" : "7F4D07577000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "1E16CB57F699E215A2A8D4EFEF90883BC749B12D" }, { "b" : "7F4D0736C000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "DF3219B89E86920E901BAC4A80AA60F2B6134588" }, { "b" : "7F4D07151000", "path" : "/usr/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "96BCC7EB28D81B1469EED6F24FC083CBD58577BC" }, { "b" : "7F4D06F10000", "path" : "/usr/lib/x86_64-linux-gnu/libgssapi.so.3", "elfType" : 3, "buildId" : "1FE877BE52A424D0636AFD4D35BB330E41D6E0F3" }, { "b" : "7F4D06CAC000", "path" : "/usr/lib/x86_64-linux-gnu/libp11-kit.so.0", "elfType" : 3, "buildId" : "A0E2D03FF5CF65937F4425D4EFD4D655243809EB" }, { "b" : "7F4D06A99000", "path" : "/usr/lib/x86_64-linux-gnu/libtasn1.so.6", "elfType" : 3, "buildId" : "E07E186694852D8F69459C6AB28A53F8DA3CE3B6" }, { "b" : "7F4D06895000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "3364D4BF2113C4E8D17EF533867ECC99A53413D6" }, { "b" : "7F4D0668C000", "path" : "/usr/lib/x86_64-linux-gnu/libheimntlm.so.0", "elfType" : 3, "buildId" : "73A8EADBC85860662B24850E71D4AFBE22C33359" }, { "b" : "7F4D06402000", "path" : "/usr/lib/x86_64-linux-gnu/libkrb5.so.26", "elfType" : 3, "buildId" : "59E742306A4EA2872E061ECCE92F35FADDA75357" }, { "b" : "7F4D06160000", "path" : "/usr/lib/x86_64-linux-gnu/libasn1.so.8", "elfType" : 3, "buildId" : "E5C159E415406AE79D21056D752BA949C408B5B1" }, { "b" : "7F4D05F2D000", "path" : "/usr/lib/x86_64-linux-gnu/libhcrypto.so.4", "elfType" : 3, "buildId" : "7D15576E1F096614D360784E4A01A1F5FAF908C9" }, { "b" : "7F4D05D17000", "path" : "/usr/lib/x86_64-linux-gnu/libroken.so.18", "elfType" : 3, "buildId" : "481DB33C28D88E43DA6BED65E1A7599407D4D818" }, { "b" : "7F4D05B0F000", "path" : "/usr/lib/x86_64-linux-gnu/libffi.so.6", "elfType" : 3, "buildId" : "9D9C958F1F4894AFEF6AECD90D1C430EA29AC34F" }, { "b" : "7F4D058E6000", "path" : "/usr/lib/x86_64-linux-gnu/libwind.so.0", "elfType" : 3, "buildId" : "57E25072866B2D30CF02EBE7AE623B84F96FA700" }, { "b" : "7F4D056D7000", "path" : "/usr/lib/x86_64-linux-gnu/libheimbase.so.1", "elfType" : 3, "buildId" : "F6F1B4E9F89B716C4A0BA5819BDFFAF4A13EFB91" }, { "b" : "7F4D0548C000", "path" : "/usr/lib/x86_64-linux-gnu/libhx509.so.5", "elfType" : 3, "buildId" : "C60082E3BB78D0D42868D9B359B89BF66CE5A1A7" }, { "b" : "7F4D051B7000", "path" : "/usr/lib/x86_64-linux-gnu/libsqlite3.so.0", "elfType" : 3, "buildId" : "D9782BA023CAEC26B15D8676E3A5D07B55E121EF" }, { "b" : "7F4D04F7F000", "path" : "/lib/x86_64-linux-gnu/libcrypt.so.1", "elfType" : 3, "buildId" : "7BDD51353D50310FFA1587E4AA01B40ABE32D582" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55e925f81ae1] mongod(+0x291510C) [0x55e925f8110c] mongod(+0x29152F0) [0x55e925f812f0] libpthread.so.0(+0x11390) [0x7f4d09783390] libpthread.so.0(pthread_create+0x4FF) [0x7f4d09779e8f] mongod(_ZN5mongo25launchServiceWorkerThreadESt8functionIFvvEE+0x354) [0x55e925cf29e4] mongod(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x2F1) [0x55e925696e41] mongod(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x10D) [0x55e92480f8cd] mongod(_ZN5mongo19ServiceStateMachine5startENS0_9OwnershipE+0x163) [0x55e9248100b3] mongod(_ZN5mongo21ServiceEntryPointImpl12startSessionESt10shared_ptrINS_9transport7SessionEE+0x88E) [0x55e92480cd4e] mongod(+0x20337EB) [0x55e92569f7eb] mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code+0x3B4) [0x55e9259432a4] mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0x115) [0x55e925943535] mongod(_ZN4asio10io_context3runEv+0x3E) [0x55e92594b1be] mongod(_ZN5mongo9transport18TransportLayerASIO12_runListenerEv+0x37E) [0x55e9256a04ee] mongod(+0x2034A0E) [0x55e9256a0a0e] mongod(+0x2A3D06F) [0x55e9260a906f] libpthread.so.0(+0x76BA) [0x7f4d097796ba] libc.so.6(clone+0x6D) [0x7f4d094af41d] ----- END BACKTRACE -----

Traffic Graph

 

 



 Comments   
Comment by Dmitry Agranat [ 13/Apr/21 ]

I suspect you would end up bumping into the same issue with the Sharded cluster as you would just move the problem from mongoD to mongoS. Addressing your connection management on the driver's side is still our recommendation to address this issue.

I will go ahead and close this ticket for now but if this issue reoccurs after you have addressed the connection management issue, feel free to reopen or open a new ticket and we'll be happy to take a new look.

Regards,
Dima

Comment by Wei Wu [ 12/Apr/21 ]

Thanks for reviewing all the log.

Do you think if we convert the cluster to a shard cluster and let MongoS to manage/handle those connections will help in our situation ?

Thanks a lot.

Wei

Comment by Dmitry Agranat [ 12/Apr/21 ]

Thanks wei.wu@newsbreak.com, all data so far points to an issue with your connection management. In each incident, we see a similar pattern of elevated TCP ListenOverflows and EstabResets counters. This means that a connection has been established at the TCP protocol level (3-way handshake), but there was no room on the queue of connections waiting to be accepted by mongod. This indicates connections are arriving faster than mongod can accept them. Your client will time out and retry with exponential backoff.

You will start seeing these messages:

operation was interrupted because a client disconnected

Shortly followed by a flood of:

received client metadata from <IP>

All this will result in SYN flooding on the port used to connect to mongod:

SYN flooding on port <mongod port>

If I am not mistaken, this is caused by the mgo driver (but I have not reviewed all the drivers involved):

name: "mongo-go-driver", version: "v1.0.2"

Increasing the max_map_count even further from the current value of 262144 will not help. With the observed rate of 2100 ListenOverflows per second during your incidents, it only takes ~2 minutes (262144/2100) to hit the reported issue. Increasing this value will only delay the inevitable.

We suggest reviewing your connection pool logic/configuration to avoid this from happening.

Regards,
Dima

Comment by Wei Wu [ 08/Apr/21 ]

Sorry, I got segfault on this 3.6.19 cluster. I will try increase `max_map_count` on this cluster.

Also uploaded requested files with prefix 20210408.

Wei

Comment by Dmitry Agranat [ 08/Apr/21 ]

Thanks wei.wu@newsbreak.com for providing the requested information.

I see this is a completely different cluster from what we've been looking at so far (we've been looking at 4.2.7 and 4.2.11 clusters while this one is 3.6.19) with a very different workload and connection management signature. This cluster also has a setting of vm.max_map_count = 65530. Did this cluster actually experienced the reported segmentation fault when it was running with the recommended setting of vm.max_map_count = 128000? If yes, can you provide mongod logs covering the time of these segfault events?

Comment by Wei Wu [ 07/Apr/21 ]

All requested files are uploaded with prefix 20210407.

As far as I know, we used following clients

 

Java > mongodb.mongo-java-driver: 3.8.2, 3.9.1
Python -> pymongo: 3.8.2, 3.9.1, 3.11.2, 4.0.4, 4.0.5
Go -> mongo-go-driver: 1.0.1, 1.1.3, 1.3.3, 1.3.5
Php -> php-driver-legacy:1.6.16

We used this Php driver which doesn't have connection pooling and caused high connection creation.

Wei

 

 

 

Comment by Dmitry Agranat [ 07/Apr/21 ]

Hi wei.wu@newsbreak.com, the attached image was for an internal discussion.

Based on the information gathered so far, our current hypothesis is that the segmentation fault was caused by reaching the limit of max_map_count. Even though it is currently set to 262144 and based on the number of connections during the event, we should not have reached this limit, we suspect some resource is being leaked during the creation of the connections or some kernel configuration might be responsible for this. Specifically, you are creating and (presumably) destroying ~7000 connections per second, which by itself is an anti-pattern but in this case, as mentioned earlier, something is being leaked during this process. Since you've turned on the "quiet" option to suppress any information about connections in the logs, it is not possible to understand the current connection management and lifecycle.

To validate this hypothesis, we'll need to gather some more data, ideally from a node where the reported segmentation fault has happened most recently.

In order to understand connections management and its lifecycle, first, we will need to log these. I assume you have turned on the quiet mode because it was printing too much data in mongod log and this is understandable with 7000 connections being created and destroyed per second. As an alternative, we'd like to collect mongod logs with the quiet mode being turned off just for a couple of minutes. You can do this by using setParameter w/o the need to restart the process:

  • Dissable the quiet mode:

    db.adminCommand( { setParameter: 1, quiet: 0 } )
    

  • Enable the quiet mode after a couple of minutes:

    db.adminCommand( { setParameter: 1, quiet: 1 } )
    

We would also need to gather kernel configuration and statistical information. Please download the mdiag.sh script from our github repository. Please run this script on one of the nodes in your replica set.

Note: This script should be run from the command line/OS shell with root/sudo privileges as follows:

sudo bash mdiag.sh SERVER-55459

It will take about 5 minutes to run in each case. Once complete, please attach the resulting /tmp/mdiag-$HOSTNAME.txt from the host(s) to the support uploader.

Following the observation of 7000 connections creation/destruction per second, we would also need to understand your driver settings, please provide the following per each driver currently used:

  • the driver type and version
  • the application connection string you are using for the connections to MongoDB
  • the driver settings (which control the lifetime of connections, pools configuration...)

After the quiet mode was disabled for a couple of minutes and the mdiag.sh was executed, please upload the following to the support uploader:

  • the $dbpath/diagnostic.data directory
  • the full mongod.log covering the time the quiet mode was off
  • the output of {{mdiag.sh}
  • the driver settings with as much details as possible

Dima

Comment by Wei Wu [ 07/Apr/21 ]

I saw you uploaded an image. Do you have more information to share ?

Thanks

Wei

Comment by Wei Wu [ 06/Apr/21 ]
  • Q: Was there any special reason to set the max_map_count to above its default of 128k?
    • A; We try to give it more memory
  • Q: Is there any special memory management configuration in this server?
    • A: No. I don't have special memory management
  • Q: Specifically, is your KVM enabled with the memory overcommitment?
    • A: We use AWS VM for our MongoDB deployment
  • Q: Do you reserve the full amount of memory for the virtual machine running MongoDB?
    • A: It's dedicated MongoDB instance. We used AWS instance i3.4xlarge
Comment by Wei Wu [ 06/Apr/21 ]

root@ip-172-31-23-251:/home/ubuntu# ldconfig -v | grep libpthread
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once 
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.23.so is the dynamic linker, ignoring 
libpthread.so.0 -> libpthread-2.23.so

Comment by Wei Wu [ 06/Apr/21 ]

root@ip-172-31-17-47:/home/ubuntu# ldconfig -v | grep libpthread
/sbin/ldconfig.real: Can't stat /usr/local/lib/x86_64-linux-gnu: No such file or directory
/sbin/ldconfig.real: Path `/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: Path `/usr/lib/x86_64-linux-gnu' given more than once
/sbin/ldconfig.real: /lib/x86_64-linux-gnu/ld-2.27.so is the dynamic linker, ignoring  libpthread.so.0 -> libpthread-2.27.so

Comment by Dmitry Agranat [ 04/Apr/21 ]

Thanks wei.wu@newsbreak.com for the requested output.

While we are still investigating this, could you please clarify a couple of points:

  • Was there any special reason to set the max_map_count to above its default of 128k?
  • Is there any special memory management configuration in this server?
  • Specifically, is your KVM enabled with the memory overcommitment?
  • Do you reserve the full amount of memory for the virtual machine running MongoDB?

One more thing. We'd like to take a closer look at calls and source code inside the shared library, libpthread.so.0. So we can continue to investigate, would you please provide a version that mongod is linking against? I believe the output of the following command should be sufficient:

ldconfig -v | grep libpthread

Comment by Wei Wu [ 02/Apr/21 ]

This host went down two times this week. Do you have more suggestions ?

Wei

Comment by Wei Wu [ 30/Mar/21 ]

Hi dmitry.agranat

Our host is set to 261144.

ip-172-31-23-251:/home/ubuntu# cat /proc/sys/vm/max_map_count 
262144

Wei Wu

Comment by Dmitry Agranat [ 29/Mar/21 ]

Hi wei.wu@newsbreak.com, thank you for providing the requested information.

Based on the segmentation fault, I believe that the low vm.max_map_count setting was likely the culprit.

For completness, could you please post the output of this command:

cat /proc/sys/vm/max_map_count

And if the value is lower than 128000 as recommended by MongoDB here (128000), could you set it to 128000 and report back with the outcome?

Dima

Comment by Wei Wu [ 28/Mar/21 ]

Any findings from those logs ?

Comment by Wei Wu [ 24/Mar/21 ]

And It hasn't happen on same host since 2021-03-23T17:40:17.

Comment by Wei Wu [ 24/Mar/21 ]

Uploaded requested logs.

We had 5 - 6 segmentation fault occurrences in past week on two different clusters. One is running 4.2.7 and another is 4.2.11. They are replicaset clusters.

Wei

Comment by Dmitry Agranat [ 24/Mar/21 ]

Hi wei.wu@newsbreak.com,

Would you please archive (tar or zip) and upload the following information to this support uploader location:

  • the full mongod.log files covering the incident
  • the $dbpath/diagnostic.data directory (the contents are described here)
  • the /var/log/messages, var/log/syslog and the output from /var/log/dmesg

Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Also, was this a one-time occurrence, or has this issue happened again since 2021-03-23T17:40:17?

Regards,
Dima

Generated at Thu Feb 08 05:36:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.