[SERVER-33385] MongoDB 3.6 crashes on Ubuntu 16.04 AWS when using Cold SC1 EBS Volume Created: 17/Feb/18  Updated: 21/Mar/18  Resolved: 17/Feb/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ben Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-32677 Segmentation fault converting Replica... Closed
is duplicated by SERVER-33384 MongoDB 3.6 crashes on Ubuntu 16.04 A... Closed
Operating System: ALL
Steps To Reproduce:

Spin up AWS t2.small Ubuntu 16.04 image with 1TB SC1 EBS volume. Install latest MongoDB from MongoDB repositories, format EBS volume to 1 XFS partition of 1TB. change MongoDB to use XFS partition for DB storage. Setup SSL and x509 certificates for mongo. Launch mongo.

MongoDB shortly falls over.

This predominantly seems to happen when a secondary connects to a master. The master then dies, and when restarted, the old secondary (now master), dies.

Participants:

 Description   

We're using the latest MongoDB 3.6 from the Ubuntu 16.04 x64 repositories, running Ubuntu 16.04 on t2.small instances, with 1TB SC1 EBS volumes used to store MongoDB.

When trying to setup a sharded cluster, we're unable to reliably keep MongoDB instances running, and MongoDB frequently crashes with segmentation fault errors:

2018-02-17T15:38:12.114+0000 F -        [thread7] Invalid access at address: 0x18
2018-02-17T15:38:12.141+0000 F -        [thread7] Got signal: 11 (Segmentation fault).
 
 0x55b3557924f1 0x55b355791709 0x55b355791d76 0x7f641d100390 0x7f641d0f8d44 0x55b354f67a06 0x55b354f6c791 0x55b3545fac8e 0x55b3545fafb0 0x55b35506b22a 0x55b35
506c6d8 0x55b3545e2032 0x55b3552a799a 0x55b3552a835c 0x55b3552a8734 0x55b35532a3b9 0x55b35532a601 0x55b3545e0f3d 0x55b35589f980 0x7f641d0f66ba 0x7f641ce2c41d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55B35359F000","o":"21F34F1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55B35359F000","o":"21F2709"},{"b":"55B35359F000","o":"21F2D76"},{"
b":"7F641D0EF000","o":"11390"},{"b":"7F641D0EF000","o":"9D44","s":"pthread_mutex_lock"},{"b":"55B35359F000","o":"19C8A06","s":"_ZN5mongo12CatalogCache27invali
dateShardedCollectionERKNS_15NamespaceStringE"},{"b":"55B35359F000","o":"19CD791","s":"_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS
_16OperationContextERKNS_15NamespaceStringE"},{"b":"55B35359F000","o":"105BC8E","s":"_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPN
S_16OperationContextE"},{"b":"55B35359F000","o":"105BFB0","s":"_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE"},{"b":"5
5B35359F000","o":"1ACC22A","s":"_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE"},{"b":"55B35359F000","o":"1ACD6D8","s":"_ZN5mongo23LogicalSessionCac
heImpl16_periodicRefreshEPNS_6ClientE"},{"b":"55B35359F000","o":"1043032"},{"b":"55B35359F000","o":"1D0899A","s":"_ZN4asio6detail14strand_service8dispatchINS0
_7binder1ISt8functionIFvSt10error_codeEES5_EEEEvRPNS1_11strand_implERT_"},{"b":"55B35359F000","o":"1D0935C","s":"_ZN4asio6detail14strand_service8dispatchINS0_
17rewrapped_handlerINS0_7binder1INS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10error_codeEENS0_26is_continuation_if_runningEEES9_EESB_EEEEvRP
NS1_11strand_implERT_"},{"b":"55B35359F000","o":"1D09734","s":"_ZN4asio6detail12wait_handlerINS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10er
ror_codeEENS0_26is_continuation_if_runningEEEE11do_completeEPvPNS0_19scheduler_operationERKS6_m"},{"b":"55B35359F000","o":"1D8B3B9","s":"_ZN4asio6detail9sched
uler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"55B35359F000","o":"1D8B601","s":"_ZN4as
io6detail9scheduler3runERSt10error_code"},{"b":"55B35359F000","o":"1041F3D"},{"b":"55B35359F000","o":"2300980"},{"b":"7F641D0EF000","o":"76BA"},{"b":"7F641CD$uler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"55B35359F000","o":"1D8B601","s"[0/1861]io6detail9scheduler3runERSt10error_code"},{"b":"55B35359F000","o":"1041F3D"},{"b":"55B35359F000","o":"2300980"},{"b":"7F641D0EF000","o":"76BA"},{"b":"7F641CD25000","o":"10741D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.2", "gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-1050-aws", "version" : "#59-Ubuntu SMP Tue Jan 30 19:57:10 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "55B35359F000", "elfType" : 3, "buildId" : "90F4CC751C09ABD90756CE2480F0217355B846B5" }, { "b" : "7FFF57AB6000", "elfType" : 3, "buildId" : "D4C3FCC8911DB8CC2FA90801300432B529D4B9BC" }, { "b" : "7F641E2E4000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "6EF73266978476EF9F2FD2CF31E57F4597CB74F8" }, { "b" : "7F641E07B000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "DCF10134B91ED2139E3E8C72564668F5CDBA8522" }, { "b" : "7F641DC37000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "1649272BE0CA9FA22F082DC86372B6C9959779B0" }, { "b" : "7F641DA2F000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "89C34D7A182387D76D5CDA1F7718F5D58824DFB3" }, { "b" : "7F641D82B000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "8CC8D0D119B142D839800BFF71FB71E73AEA7BD4" }, { "b" : "7F641D522000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "DFB85DE42DAFFD09640C8FE377D572DE3E168920" }, { "b" : "7F641D30C000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F641D0EF000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "CE17E023542265FC11D9BC8F534BB4F070493D30" }, { "b" : "7F641CD25000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B5381A457906D279073822A5CEB24C4BFEF94DDB" }, { "b" : "7F641E4FF000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5D7B6259552275A3C17BD4C3FD05F5A6BF40CAA5" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55b3557924f1]
 mongod(+0x21F2709) [0x55b355791709]
 mongod(+0x21F2D76) [0x55b355791d76]
 libpthread.so.0(+0x11390) [0x7f641d100390]
 libpthread.so.0(pthread_mutex_lock+0x4) [0x7f641d0f8d44]
 mongod(_ZN5mongo12CatalogCache27invalidateShardedCollectionERKNS_15NamespaceStringE+0x46) [0x55b354f67a06]
 mongod(_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS_16OperationContextERKNS_15NamespaceStringE+0x41) [0x55b354f6c791]
 mongod(_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPNS_16OperationContextE+0x10E) [0x55b3545fac8e]
 mongod(_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE+0x20) [0x55b3545fafb0]
 mongod(_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE+0x12A) [0x55b35506b22a]
 mongod(_ZN5mongo23LogicalSessionCacheImpl16_periodicRefreshEPNS_6ClientE+0x28) [0x55b35506c6d8]
 mongod(+0x1043032) [0x55b3545e2032]
 mongod(_ZN4asio6detail14strand_service8dispatchINS0_7binder1ISt8functionIFvSt10error_codeEES5_EEEEvRPNS1_11strand_implERT_+0x7A) [0x55b3552a799a]
 mongod(_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder1INS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10error_codeEENS0_26is_continuation_if_runningEEES9_EESB_EEEEvRPNS1_11strand_implERT_+0x3AC) [0x55b3552a835c]
 mongod(_ZN4asio6detail12wait_handlerINS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10error_codeEENS0_26is_continuation_if_runningEEEE11do_completeEPvPNS0_19scheduler_operationERKS6_m+0x164) [0x55b3552a8734]
 mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code+0x389) [0x55b35532a3b9]
 mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0xD1) [0x55b35532a601]
 mongod(+0x1041F3D) [0x55b3545e0f3d]
 mongod(+0x2300980) [0x55b35589f980]
 libpthread.so.0(+0x76BA) [0x7f641d0f66ba]
 libc.so.6(clone+0x6D) [0x7f641ce2c41d]
-----  END BACKTRACE  -----

We've been using MongoDB to setup the config shard cluster fine on the same AWS account and region, with the difference being these are micro instances, and use 64GB GP2 SSDs.

The above is consistently reproducible with 2 nodes that are configured the same.



 Comments   
Comment by Kelsey Schubert [ 17/Feb/18 ]

Hi bcogative,

Thanks for reporting this issue. We're actively working on a fix in SERVER-32677. Please feel free to review this ticket and watch it for updates.

Kind regards,
Kelsey

Comment by Ben [ 17/Feb/18 ]

After reconfiguring the same instances with SSD EBS volumes, the error persists. Perhaps it's related to shardsvr type instead of confsvr?

Generated at Thu Feb 08 04:33:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.