Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-52918

Mongos router repeatedly crashes

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Blocker - P1
    • Resolution: Duplicate
    • 4.2.6
    • None
    • None
    • None
    • ALL
    • Hide

      Connect to mongos, terminate the connection - mongos crashes

      Show
      Connect to mongos, terminate the connection - mongos crashes

    Description

      Today the mongos router process in our staging environment started crashing repeatedly. If we do not attempt to open any mongo connections via it, it will stay up. But when you try to connect to mongo via it, the connection hangs (eg with the mongo cli you never reach the prompt), and if you terminate the connection (eg ctrl-c the mongo cli) the mongos immediately crashes, with the following messages being logged:

       

      2020-11-17T12:52:37.449-0700 I NETWORK [listener] connection accepted from 127.0.0.1:53468 #6 (1 connection now open)
      {{2020-11-17T12:52:37.452-0700 I NETWORK [conn6] received client metadata from 127.0.0.1:53468 conn6: { application:

      { name: "MongoDB Shell" }

      , driver: { name: "MongoDB Internal Client", version: "4.2.6" }, os: { type: "Linux", name: "CentOS Linux release 7.6.1810 (Core) ", architecture: "x86_64", version: "Kernel 3.10.0-957.21.3.el7.x86_64" } }}}
      2020-11-17T12:52:46.512-0700 I - [conn6] operation was interrupted because a client disconnected
      2020-11-17T12:52:46.524-0700 F - [conn6] terminate() called. No exception is active 0x55cf31d2fbb1 0x55cf31d2f536 0x55cf31e3a156 0x55cf31e3a191 0x55cf30f4c85d 0x55cf311e1425 0x55cf311e1992 0x55cf310f84c0 0x55cf311301ac 0x55cf3112b8df 0x55cf3112d56c 0x55cf314b47f2 0x55cf311287ad 0x55cf3112a023 0x55cf3112abd6 0x55cf3112b83b 0x55cf3112d56c 0x55cf314b4c5b 0x55cf31bc7a45 0x55cf31bc7aa4 0x7f20c8347dd5 0x7f20c8070ead
      ----- BEGIN BACKTRACE -----
      {{

      {"backtrace":[\{"b":"55CF309B2000","o":"137DBB1","s":"_ZN5mongo15printStackTraceERSo"}

      ,{"b":"55CF309B2000","o":"137D536"},{"b":"55CF309B2000","o":"1488156","s":"ZN10cxxabiv111_terminateEPFvvE"},{"b":"55CF309B2000","o":"1488191"},{"b":"55CF309B2000","o":"59A85D"},{"b":"55CF309B2000","o":"82F425"},{"b":"55CF309B2000","o":"82F992","s":"_ZN5mongo8Strategy13clientCommandEPNS_16OperationContextERKNS_7MessageE"},{"b":"55CF309B2000","o":"7464C0","s":"_ZN5mongo23ServiceEntryPointMongos13handleRequestEPNS_16OperationContextERKNS_7MessageE"},{"b":"55CF309B2000","o":"77E1AC","s":"_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE"},{"b":"55CF309B2000","o":"7798DF","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"55CF309B2000","o":"77B56C"},{"b":"55CF309B2000","o":"B027F2","s":"_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE"},{"b":"55CF309B2000","o":"7767AD","s":"_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE"},{"b":"55CF309B2000","o":"778023","s":"_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE"},{"b":"55CF309B2000","o":"778BD6","s":"_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE"},{"b":"55CF309B2000","o":"77983B","s":"_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE"},{"b":"55CF309B2000","o":"77B56C"},{"b":"55CF309B2000","o":"B02C5B"},{"b":"55CF309B2000","o":"1215A45"},{"b":"55CF309B2000","o":"1215AA4"},{"b":"7F20C8340000","o":"7DD5"},{"b":"7F20C7F73000","o":"FDEAD","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.2.6", "gitVersion" : "20364840b8f1af16917e4c23c1b5f5efd8b352f8", "compiledModules" : [], "uname" :

      { "sysname" : "Linux", "release" : "3.10.0-957.21.3.el7.x86_64", "version" : "#1 SMP Tue Jun 18 16:35:19 UTC 2019", "machine" : "x86_64" }

      , "somap" : [ { "b" : "55CF309B2000", "elfType" : 3, "buildId" : "E432730CDAA0AC3E3913E1A4B4160334D52CB1C9" }, { "b" : "7FFD563D6000", "elfType" : 3, "buildId" : "BC4D21950F4B2ADFB515DFBB0E082E2281689A0B" }, { "b" : "7F20C976D000", "path" : "/lib64/libcurl.so.4", "elfType" : 3, "buildId" : "9114859D3C4BEC47A03CA321EE367DCA799638CD" }, { "b" : "7F20C9554000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "C444AE61E7CBB716FD9C18A0B46A7FE8F4FCF3E5" }, { "b" : "7F20C90F2000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "3593FA778645A59EA272DBBB59D318C60940E792" }, { "b" : "7F20C8E80000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "AEF5E6F2240B55F90E9DF76CFBB8B9D9F5286583" }, { "b" : "7F20C8C7C000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "357693C8F1F49D93010C4E31529C07CDD2BD3D08" }, { "b" : "7F20C8A74000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "EFDE2029C9A4A20BE5B8D8AE7E6551FF9B5755D2" }, { "b" : "7F20C8772000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5B14BE4D749631673523A61074C10959D50F5455" }, { "b" : "7F20C855C000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "179F202998E429AA1215907F6D4C5C1BB9C90136" }, { "b" : "7F20C8340000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "96900CB0FF25B26F2BBDF247DE1408242E4773D8" }, { "b" : "7F20C7F73000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "EB9F22A3891E5FD3494DFD9ED199E20AE71BB08D" }, { "b" : "7F20C99D6000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "A527FE72908703C5972AE384E78D1850D1881EE7" }, { "b" : "7F20C7D40000", "path" : "/lib64/libidn.so.11", "elfType" : 3, "buildId" : "2B77BBEFFF65E94F3E0B71A4E89BEB68C4B476C5" }, { "b" : "7F20C7B16000", "path" : "/lib64/libssh2.so.1", "elfType" : 3, "buildId" : "689404B6B895EACFAD65BA16E07E5BF0004F5E0C" }, { "b" : "7F20C78C4000", "path" : "/lib64/libssl3.so", "elfType" : 3, "buildId" : "2E28F6A705F2ECEA8460D4716D5D1C24B5DDA5E4" }, { "b" : "7F20C769D000", "path" : "/lib64/libsmime3.so", "elfType" : 3, "buildId" : "8D0B4010959C321022DF9CE239277A9D7B34A76A" }, { "b" : "7F20C7370000", "path" : "/lib64/libnss3.so", "elfType" : 3, "buildId" : "F5A64BB37FA3972E545EF459A51310F0AB56FA56" }, { "b" : "7F20C7140000", "path" : "/lib64/libnssutil3.so", "elfType" : 3, "buildId" : "E0705772325A52C3372FFFB8BDE5F786E2E200D6" }, { "b" : "7F20C6F3C000", "path" : "/lib64/libplds4.so", "elfType" : 3, "buildId" : "084D2194302908913F68B9DCD27DE46FA5B50522" }, { "b" : "7F20C6D37000", "path" : "/lib64/libplc4.so", "elfType" : 3, "buildId" : "799B28AD9A5460D78376E2C11260F2E858B95DE3" }, { "b" : "7F20C6AF9000", "path" : "/lib64/libnspr4.so", "elfType" : 3, "buildId" : "DE762A28174110911B273E175D54F222B313CFE0" }, { "b" : "7F20C68AC000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "BCC30853830CD911E58700591830DF51ABCBD7BA" }, { "b" : "7F20C65C3000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "45BAB0BB455BDFA960FDA22E4124CF17B67CC930" }, { "b" : "7F20C6390000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "A9B3906192687CC45D483AE3C58C8AF745A6726A" }, { "b" : "7F20C618C000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "B4BE1023D9606A88169DF411BF94AF417D7BA1A0" }, { "b" : "7F20C5F7D000", "path" : "/lib64/liblber-2.4.so.2", "elfType" : 3, "buildId" : "3192C56CD451E18EB9F29CB045432BA9C738DD29" }, { "b" : "7F20C5D28000", "path" : "/lib64/libldap-2.4.so.2", "elfType" : 3, "buildId" : "F1FADDDE0D21D5F4E2DCADEDD3B85B6E7AAC9883" }, { "b" : "7F20C5B12000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "B9D5F73428BD6AD68C96986B57BEA3B7CEDB9745" }, { "b" : "7F20C5902000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "94B3BCB669126166B77CDCE6092679A6AA2004C8" }, { "b" : "7F20C56FE000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F20C54E1000", "path" : "/lib64/libsasl2.so.3", "elfType" : 3, "buildId" : "E2F2017F821DD1B9D307DA1A9B8014F2941AEB7B" }, { "b" : "7F20C52BA000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "D2DD4DA3FDE1477D25BFFF80F3A25FDB541A8179" }, { "b" : "7F20C5083000", "path" : "/lib64/libcrypt.so.1", "elfType" : 3, "buildId" : "740CAD898E29E1F3B73A323CCEC4A7C88911647F" }, { "b" : "7F20C4E21000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" }, { "b" : "7F20C4C1E000", "path" : "/lib64/libfreebl3.so", "elfType" : 3, "buildId" : "B758881F4B6AF6C28C07A1A57713CBD2144628D4" }, { "b" : "7F20C4A0B000", "path" : "/lib64/libnss_files.so.2", "elfType" : 3, "buildId" : "EB4032E5BEEFD1751F164AE026A99F3FEA8F7454" }, { "b" : "7F20C4804000", "path" : "/lib64/libnss_dns.so.2", "elfType" : 3, "buildId" : "1CBCAFE76C83D1C6B0B69B361D723B629F26141A" } ] }}}}
      {{ mongos(_ZN5mongo15printStackTraceERSo+0x41) [0x55cf31d2fbb1]}}
      {{ mongos(+0x137D536) [0x55cf31d2f536]}}
      {{ mongos(ZN10cxxabiv111_terminateEPFvvE+0x6) [0x55cf31e3a156]}}
      {{ mongos(+0x1488191) [0x55cf31e3a191]}}
      {{ mongos(+0x59A85D) [0x55cf30f4c85d]}}
      {{ mongos(+0x82F425) [0x55cf311e1425]}}
      {{ mongos(_ZN5mongo8Strategy13clientCommandEPNS_16OperationContextERKNS_7MessageE+0x1C2) [0x55cf311e1992]}}
      {{ mongos(_ZN5mongo23ServiceEntryPointMongos13handleRequestEPNS_16OperationContextERKNS_7MessageE+0x3D0) [0x55cf310f84c0]}}
      {{ mongos(_ZN5mongo19ServiceStateMachine15_processMessageENS0_11ThreadGuardE+0xEC) [0x55cf311301ac]}}
      {{ mongos(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0x17F) [0x55cf3112b8df]}}
      {{ mongos(+0x77B56C) [0x55cf3112d56c]}}
      {{ mongos(_ZN5mongo9transport26ServiceExecutorSynchronous8scheduleESt8functionIFvvEENS0_15ServiceExecutor13ScheduleFlagsENS0_23ServiceExecutorTaskNameE+0x182) [0x55cf314b47f2]}}
      {{ mongos(_ZN5mongo19ServiceStateMachine22_scheduleNextWithGuardENS0_11ThreadGuardENS_9transport15ServiceExecutor13ScheduleFlagsENS2_23ServiceExecutorTaskNameENS0_9OwnershipE+0x10D) [0x55cf311287ad]}}
      {{ mongos(_ZN5mongo19ServiceStateMachine15_sourceCallbackENS_6StatusE+0x753) [0x55cf3112a023]}}
      {{ mongos(_ZN5mongo19ServiceStateMachine14_sourceMessageENS0_11ThreadGuardE+0x316) [0x55cf3112abd6]}}
      {{ mongos(_ZN5mongo19ServiceStateMachine15_runNextInGuardENS0_11ThreadGuardE+0xDB) [0x55cf3112b83b]}}
      {{ mongos(+0x77B56C) [0x55cf3112d56c]}}
      {{ mongos(+0xB02C5B) [0x55cf314b4c5b]}}
      {{ mongos(+0x1215A45) [0x55cf31bc7a45]}}
      {{ mongos(+0x1215AA4) [0x55cf31bc7aa4]}}
      {{ libpthread.so.0(+0x7DD5) [0x7f20c8347dd5]}}
      {{ libc.so.6(clone+0x6D) [0x7f20c8070ead]}}
      ----- END BACKTRACE -----
      2020-11-17T12:52:46.814-0700 I CONTROL [main] ***** SERVER RESTARTED *****
      2020-11-17T12:52:46.817-0700 I CONTROL [main] Automatically disabling TLS 1.0, to force-enable TLS 1.0 specify --sslDisabledProtocols 'none'
      2020-11-17T12:52:46.942-0700 I SHARDING [mongosMain] mongos version v4.2.6
      2020-11-17T12:52:46.942-0700 I CONTROL [mongosMain] db version v4.2.6
      2020-11-17T12:52:46.942-0700 I CONTROL [mongosMain] git version: 20364840b8f1af16917e4c23c1b5f5efd8b352f8
      2020-11-17T12:52:46.942-0700 I CONTROL [mongosMain] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
      2020-11-17T12:52:46.942-0700 I CONTROL [mongosMain] allocator: tcmalloc
      2020-11-17T12:52:46.942-0700 I CONTROL [mongosMain] modules: none
      2020-11-17T12:52:46.943-0700 I CONTROL [mongosMain] build environment:
      2020-11-17T12:52:46.943-0700 I CONTROL [mongosMain] distmod: rhel70
      2020-11-17T12:52:46.943-0700 I CONTROL [mongosMain] distarch: x86_64
      2020-11-17T12:52:46.943-0700 I CONTROL [mongosMain] target_arch: x86_64
      {{2020-11-17T12:52:46.943-0700 I CONTROL [mongosMain] options: { config: "/etc/mongos.conf", net:

      { bindIp: "localhost,172.27.8.10", port: 27017 }

      , processManagement: { pidFilePath: "/var/run/mongodb/mongos.pid" }, security: { keyFile: "/var/run/mongodb/keyfile" }, sharding: { configDB: "stage-metric-config/slc-stage-mongoc11:27017" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongos.log" } }}}
      2020-11-17T12:52:46.944-0700 I NETWORK [mongosMain] Starting new replica set monitor for stage-metric-config/slc-stage-mongoc11:27017
      2020-11-17T12:52:46.945-0700 I CONNPOOL [ReplicaSetMonitor-TaskExecutor] Connecting to slc-stage-mongoc11:27017
      2020-11-17T12:52:46.945-0700 I SHARDING [thread1] creating distributed lock ping thread for process slc-stage-mongos11:27017:1605642766:-1335266176871135546 (sleeping for 30000ms)
      2020-11-17T12:52:47.068-0700 I NETWORK [ReplicaSetMonitor-TaskExecutor] Confirmed replica set for stage-metric-config is stage-metric-config/slc-stage-mongoc11:27017
      2020-11-17T12:52:47.068-0700 I SHARDING [Sharding-Fixed-0] Updating sharding state with confirmed set stage-metric-config/slc-stage-mongoc11:27017
      {{2020-11-17T12:52:47.279-0700 I SHARDING [ShardRegistry] Received reply from config server node (unknown) indicating config server optime term has increased, previous optime { ts: Timestamp(0, 0), t: -1 }, now { ts: Timestamp(1605642766, 2), t: 13 }}}
      2020-11-17T12:52:47.282-0700 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document
      2020-11-17T12:52:49.282-0700 I FTDC [mongosMain] Initializing full-time diagnostic data capture with directory '/var/log/mongodb/mongos.diagnostic.data'
      2020-11-17T12:52:49.284-0700 I NETWORK [listener] Listening on /tmp/mongodb-27017.sock
      2020-11-17T12:52:49.284-0700 I NETWORK [listener] Listening on 127.0.0.1
      2020-11-17T12:52:49.284-0700 I NETWORK [listener] Listening on 172.27.8.10
      2020-11-17T12:52:49.285-0700 I NETWORK [listener] waiting for connections on port 27017
      2020-11-17T12:52:49.284-0700 I SH_REFR [ConfigServerCatalogCacheLoader-0] Refresh for database config from version {} to version { uuid: UUID("66a93506-4b5d-4657-a506-46cb1b074cb4"), lastMod: 0 } took 0 ms
      2020-11-17T12:52:49.287-0700 I SH_REFR [ConfigServerCatalogCacheLoader-0] Refresh for collection config.system.sessions to version 1|0||5ec6cb0452abc60aba42eb80 took 1 ms
      2020-11-17T12:52:50.003-0700 I FTDC [ftdc] Unclean full-time diagnostic data capture shutdown detected, found interim file, some metrics may have been lost. OK
      2020-11-17T12:53:47.281-0700 I CONNPOOL [ShardRegistry] Ending idle connection to host slc-stage-mongoc11:27017 because the pool meets constraints; 3 connections to that host remain open
      2020-11-17T12:53:47.489-0700 I CONNPOOL [ShardRegistry] Ending idle connection to host slc-stage-mongoc11:27017 because the pool meets constraints; 2 connections to that host remain open
      2020-11-17T12:54:17.281-0700 I CONNPOOL [ShardRegistry] Ending idle connection to host slc-stage-mongoc11:27017 because the pool meets constraints; 1 connections to that host remain open
      2020-11-17T12:57:49.284-0700 I CONNPOOL [ShardRegistry] Connecting to slc-stage-mongoc11:27017

      Attachments

        Issue Links

          Activity

            People

              edwin.zhou@mongodb.com Edwin Zhou
              gavin.aiken@netcuras.com Gavin AIken
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: