Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33385

MongoDB 3.6 crashes on Ubuntu 16.04 AWS when using Cold SC1 EBS Volume

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 3.6.2
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      Spin up AWS t2.small Ubuntu 16.04 image with 1TB SC1 EBS volume. Install latest MongoDB from MongoDB repositories, format EBS volume to 1 XFS partition of 1TB. change MongoDB to use XFS partition for DB storage. Setup SSL and x509 certificates for mongo. Launch mongo.

      MongoDB shortly falls over.

      This predominantly seems to happen when a secondary connects to a master. The master then dies, and when restarted, the old secondary (now master), dies.

      Show
      Spin up AWS t2.small Ubuntu 16.04 image with 1TB SC1 EBS volume. Install latest MongoDB from MongoDB repositories, format EBS volume to 1 XFS partition of 1TB. change MongoDB to use XFS partition for DB storage. Setup SSL and x509 certificates for mongo. Launch mongo. MongoDB shortly falls over. This predominantly seems to happen when a secondary connects to a master. The master then dies, and when restarted, the old secondary (now master), dies.

      Description

      We're using the latest MongoDB 3.6 from the Ubuntu 16.04 x64 repositories, running Ubuntu 16.04 on t2.small instances, with 1TB SC1 EBS volumes used to store MongoDB.

      When trying to setup a sharded cluster, we're unable to reliably keep MongoDB instances running, and MongoDB frequently crashes with segmentation fault errors:

      2018-02-17T15:38:12.114+0000 F -        [thread7] Invalid access at address: 0x18
      2018-02-17T15:38:12.141+0000 F -        [thread7] Got signal: 11 (Segmentation fault).
       
       0x55b3557924f1 0x55b355791709 0x55b355791d76 0x7f641d100390 0x7f641d0f8d44 0x55b354f67a06 0x55b354f6c791 0x55b3545fac8e 0x55b3545fafb0 0x55b35506b22a 0x55b35
      506c6d8 0x55b3545e2032 0x55b3552a799a 0x55b3552a835c 0x55b3552a8734 0x55b35532a3b9 0x55b35532a601 0x55b3545e0f3d 0x55b35589f980 0x7f641d0f66ba 0x7f641ce2c41d
      ----- BEGIN BACKTRACE -----
      {"backtrace":[{"b":"55B35359F000","o":"21F34F1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55B35359F000","o":"21F2709"},{"b":"55B35359F000","o":"21F2D76"},{"
      b":"7F641D0EF000","o":"11390"},{"b":"7F641D0EF000","o":"9D44","s":"pthread_mutex_lock"},{"b":"55B35359F000","o":"19C8A06","s":"_ZN5mongo12CatalogCache27invali
      dateShardedCollectionERKNS_15NamespaceStringE"},{"b":"55B35359F000","o":"19CD791","s":"_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS
      _16OperationContextERKNS_15NamespaceStringE"},{"b":"55B35359F000","o":"105BC8E","s":"_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPN
      S_16OperationContextE"},{"b":"55B35359F000","o":"105BFB0","s":"_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE"},{"b":"5
      5B35359F000","o":"1ACC22A","s":"_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE"},{"b":"55B35359F000","o":"1ACD6D8","s":"_ZN5mongo23LogicalSessionCac
      heImpl16_periodicRefreshEPNS_6ClientE"},{"b":"55B35359F000","o":"1043032"},{"b":"55B35359F000","o":"1D0899A","s":"_ZN4asio6detail14strand_service8dispatchINS0
      _7binder1ISt8functionIFvSt10error_codeEES5_EEEEvRPNS1_11strand_implERT_"},{"b":"55B35359F000","o":"1D0935C","s":"_ZN4asio6detail14strand_service8dispatchINS0_
      17rewrapped_handlerINS0_7binder1INS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10error_codeEENS0_26is_continuation_if_runningEEES9_EESB_EEEEvRP
      NS1_11strand_implERT_"},{"b":"55B35359F000","o":"1D09734","s":"_ZN4asio6detail12wait_handlerINS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10er
      ror_codeEENS0_26is_continuation_if_runningEEEE11do_completeEPvPNS0_19scheduler_operationERKS6_m"},{"b":"55B35359F000","o":"1D8B3B9","s":"_ZN4asio6detail9sched
      uler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"55B35359F000","o":"1D8B601","s":"_ZN4as
      io6detail9scheduler3runERSt10error_code"},{"b":"55B35359F000","o":"1041F3D"},{"b":"55B35359F000","o":"2300980"},{"b":"7F641D0EF000","o":"76BA"},{"b":"7F641CD$uler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code"},{"b":"55B35359F000","o":"1D8B601","s"[0/1861]io6detail9scheduler3runERSt10error_code"},{"b":"55B35359F000","o":"1041F3D"},{"b":"55B35359F000","o":"2300980"},{"b":"7F641D0EF000","o":"76BA"},{"b":"7F641CD25000","o":"10741D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.2", "gitVersion" : "489d177dbd0f0420a8ca04d39fd78d0a2c539420", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.0-1050-aws", "version" : "#59-Ubuntu SMP Tue Jan 30 19:57:10 UTC 2018", "machine" : "x86_64" }, "somap" : [ { "b" : "55B35359F000", "elfType" : 3, "buildId" : "90F4CC751C09ABD90756CE2480F0217355B846B5" }, { "b" : "7FFF57AB6000", "elfType" : 3, "buildId" : "D4C3FCC8911DB8CC2FA90801300432B529D4B9BC" }, { "b" : "7F641E2E4000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "6EF73266978476EF9F2FD2CF31E57F4597CB74F8" }, { "b" : "7F641E07B000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "DCF10134B91ED2139E3E8C72564668F5CDBA8522" }, { "b" : "7F641DC37000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "1649272BE0CA9FA22F082DC86372B6C9959779B0" }, { "b" : "7F641DA2F000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "89C34D7A182387D76D5CDA1F7718F5D58824DFB3" }, { "b" : "7F641D82B000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "8CC8D0D119B142D839800BFF71FB71E73AEA7BD4" }, { "b" : "7F641D522000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "DFB85DE42DAFFD09640C8FE377D572DE3E168920" }, { "b" : "7F641D30C000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "68220AE2C65D65C1B6AAA12FA6765A6EC2F5F434" }, { "b" : "7F641D0EF000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "CE17E023542265FC11D9BC8F534BB4F070493D30" }, { "b" : "7F641CD25000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B5381A457906D279073822A5CEB24C4BFEF94DDB" }, { "b" : "7F641E4FF000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "5D7B6259552275A3C17BD4C3FD05F5A6BF40CAA5" } ] }}
       mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55b3557924f1]
       mongod(+0x21F2709) [0x55b355791709]
       mongod(+0x21F2D76) [0x55b355791d76]
       libpthread.so.0(+0x11390) [0x7f641d100390]
       libpthread.so.0(pthread_mutex_lock+0x4) [0x7f641d0f8d44]
       mongod(_ZN5mongo12CatalogCache27invalidateShardedCollectionERKNS_15NamespaceStringE+0x46) [0x55b354f67a06]
       mongod(_ZN5mongo12CatalogCache42getShardedCollectionRoutingInfoWithRefreshEPNS_16OperationContextERKNS_15NamespaceStringE+0x41) [0x55b354f6c791]
       mongod(_ZN5mongo25SessionsCollectionSharded32_checkCacheForSessionsCollectionEPNS_16OperationContextE+0x10E) [0x55b3545fac8e]
       mongod(_ZN5mongo25SessionsCollectionSharded23setupSessionsCollectionEPNS_16OperationContextE+0x20) [0x55b3545fafb0]
       mongod(_ZN5mongo23LogicalSessionCacheImpl8_refreshEPNS_6ClientE+0x12A) [0x55b35506b22a]
       mongod(_ZN5mongo23LogicalSessionCacheImpl16_periodicRefreshEPNS_6ClientE+0x28) [0x55b35506c6d8]
       mongod(+0x1043032) [0x55b3545e2032]
       mongod(_ZN4asio6detail14strand_service8dispatchINS0_7binder1ISt8functionIFvSt10error_codeEES5_EEEEvRPNS1_11strand_implERT_+0x7A) [0x55b3552a799a]
       mongod(_ZN4asio6detail14strand_service8dispatchINS0_17rewrapped_handlerINS0_7binder1INS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10error_codeEENS0_26is_continuation_if_runningEEES9_EESB_EEEEvRPNS1_11strand_implERT_+0x3AC) [0x55b3552a835c]
       mongod(_ZN4asio6detail12wait_handlerINS0_15wrapped_handlerINS_10io_context6strandESt8functionIFvSt10error_codeEENS0_26is_continuation_if_runningEEEE11do_completeEPvPNS0_19scheduler_operationERKS6_m+0x164) [0x55b3552a8734]
       mongod(_ZN4asio6detail9scheduler10do_run_oneERNS0_27conditionally_enabled_mutex11scoped_lockERNS0_21scheduler_thread_infoERKSt10error_code+0x389) [0x55b35532a3b9]
       mongod(_ZN4asio6detail9scheduler3runERSt10error_code+0xD1) [0x55b35532a601]
       mongod(+0x1041F3D) [0x55b3545e0f3d]
       mongod(+0x2300980) [0x55b35589f980]
       libpthread.so.0(+0x76BA) [0x7f641d0f66ba]
       libc.so.6(clone+0x6D) [0x7f641ce2c41d]
      -----  END BACKTRACE  -----
      

      We've been using MongoDB to setup the config shard cluster fine on the same AWS account and region, with the difference being these are micro instances, and use 64GB GP2 SSDs.

      The above is consistently reproducible with 2 nodes that are configured the same.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kelsey.schubert Kelsey T Schubert
              Reporter:
              bcogative Ben
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: