[SERVER-22391] Mongos 3.2.1 crashes after this error, failure _checkedOutPool.empty() src/mongo/executor/connection_pool.cpp 570 Created: 01/Feb/16  Updated: 08/Jan/24  Resolved: 24/Mar/16

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 3.2.1
Fix Version/s: 3.2.5, 3.3.4

Type: Bug Priority: Major - P3
Reporter: Edik Mkoyan Assignee: Mira Carey
Resolution: Done Votes: 0
Labels: code-and-test
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-23674 Invariant failure _checkedOutPool.emp... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Sprint: Platforms 10 (02/19/16), Platforms 12 (04/01/16)
Participants:

 Description   

2016-01-29T20:38:15.122+0400 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-3-0] Successfully connected to dj-mrs-master.tumo.lab:27017
2016-01-29T20:38:18.724+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:38:26.725+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:38:34.725+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:38:42.724+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:38:50.727+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:38:58.725+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:06.724+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:14.732+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:16.084+0400 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-1-0] Successfully connected to gm-central-mrs-slave:27017
2016-01-29T20:39:16.198+0400 I ASIO     [NetworkInterfaceASIO-TaskExecutorPool-1-0] Successfully connected to sk-central-mrs-slave:27017
2016-01-29T20:39:22.725+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:30.725+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:32.827+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:38.729+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:46.724+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:39:54.724+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:40:02.725+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:40:10.726+0400 I ACCESS   [conn79] Successfully authenticated as principal root on admin
2016-01-29T20:40:15.132+0400 I -        [NetworkInterfaceASIO-TaskExecutorPool-2-0] Invariant failure _checkedOutPool.empty() src/mongo/executor/connection_pool.cpp 570
2016-01-29T20:40:15.132+0400 I -        [NetworkInterfaceASIO-TaskExecutorPool-2-0] 
 
***aborting after invariant() failure
 
 
2016-01-29T20:40:15.182+0400 F -        [NetworkInterfaceASIO-TaskExecutorPool-2-0] Got signal: 6 (Aborted).
 
 0xc31fc2 0xc31129 0xc31932 0x7efda9d56340 0x7efda99b7cc9 0x7efda99bb0d8 0xbbb1db 0x9d5783 0x9d7f9c 0x9d8b17 0x9d9281 0xc4dc61 0xc4de81 0xc5201f 0x9e2445 0x7efdaa531a40 0x7efda9d4e182 0x7efda9a7b47d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"831FC2"},{"b":"400000","o":"831129"},{"b":"400000","o":"831932"},{"b":"7EFDA9D46000","o":"10340"},{"b":"7EFDA9981000","o":"36CC9"},{"b":"7EFDA9981000","o":"3A0D8"},{"b":"400000","o":"7BB1DB"},{"b":"400000","o":"5D5783"},{"b":"400000","o":"5D7F9C"},{"b":"400000","o":"5D8B17"},{"b":"400000","o":"5D9281"},{"b":"400000","o":"84DC61"},{"b":"400000","o":"84DE81"},{"b":"400000","o":"85201F"},{"b":"400000","o":"5E2445"},{"b":"7EFDAA480000","o":"B1A40"},{"b":"7EFDA9D46000","o":"8182"},{"b":"7EFDA9981000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.2.1", "gitVersion" : "a14d55980c2cdc565d4704a7e3ad37e4e535c1b2", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.19.0-47-generic", "version" : "#53~14.04.1-Ubuntu SMP Mon Jan 18 16:09:14 UTC 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "13702C8A170E2A8D9CB29118516D8D3C99083030" }, { "b" : "7FFC8C955000", "elfType" : 3, "buildId" : "A7B0B18F330161D11A1B8E085900D9E8244369FF" }, { "b" : "7EFDAAF6B000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "D08DD65F97859C71BB2CBBF1043BD968EFE18AAD" }, { "b" : "7EFDAAB90000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F86FA9FB4ECEB4E06B40DBDF761A4172B70A4229" }, { "b" : "7EFDAA988000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7EFDAA784000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7EFDAA480000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7EFDAA17A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7EFDA9F64000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7EFDA9D46000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7EFDA9981000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7EFDAB1CA000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongos(_ZN5mongo15printStackTraceERSo+0x32) [0xc31fc2]
 mongos(+0x831129) [0xc31129]
 mongos(+0x831932) [0xc31932]
 libpthread.so.0(+0x10340) [0x7efda9d56340]
 libc.so.6(gsignal+0x39) [0x7efda99b7cc9]
 libc.so.6(abort+0x148) [0x7efda99bb0d8]
 mongos(_ZN5mongo15invariantFailedEPKcS1_j+0xCB) [0xbbb1db]
 mongos(_ZN5mongo8executor14ConnectionPool12SpecificPool8shutdownEv+0x2A3) [0x9d5783]
 mongos(+0x5D7F9C) [0x9d7f9c]
 mongos(+0x5D8B17) [0x9d8b17]
 mongos(+0x5D9281) [0x9d9281]
 mongos(_ZN4asio6detail9scheduler10do_run_oneERNS0_11scoped_lockINS0_11posix_mutexEEERNS0_21scheduler_thread_infoERKSt10error_code+0x2F1) [0xc4dc61]
 mongos(_ZN4asio6detail9scheduler3runERSt10error_code+0xC1) [0xc4de81]
 mongos(_ZN4asio10io_service3runEv+0x2F) [0xc5201f]
 mongos(+0x5E2445) [0x9e2445]
 libstdc++.so.6(+0xB1A40) [0x7efdaa531a40]
 libpthread.so.0(+0x8182) [0x7efda9d4e182]
 libc.so.6(clone+0x6D) [0x7efda9a7b47d]
-----  END BACKTRACE  -----
2016-02-01T10:28:59.408+0400 I CONTROL  [main] ***** SERVER RESTARTED *****
2016-02-01T10:28:59.420+0400 I CONTROL  [main] ** WARNING: You are running this process as the root user, which is not recommended.
2016-02-01T10:28:59.420+0400 I CONTROL  [main] 
2016-02-01T10:28:59.451+0400 I SHARDING [mongosMain] MongoS version 3.2.1 starting: pid=21899 port=27017 64-bit host=dj-web02.tumo.lab (--help for usage)
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain] db version v3.2.1
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain] git version: a14d55980c2cdc565d4704a7e3ad37e4e535c1b2
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain] OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain] allocator: tcmalloc
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain] modules: none
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain] build environment:
2016-02-01T10:28:59.451+0400 I CONTROL  [mongosMain]     distmod: ubuntu1404
2016-02-01T10:28:59.452+0400 I CONTROL  [mongosMain]     distarch: x86_64
2016-02-01T10:28:59.452+0400 I CONTROL  [mongosMain]     target_arch: x86_64
2016-02-01T10:28:59.452+0400 I CONTROL  [mongosMain] options: { config: "/etc/mongos.conf", net: { port: 27017 }, processManagement: { fork: true }, security: { keyFile: "/etc/mongodb_keyfile" }, sharding: { configDB: "mcfg/dj-cfg.tumo.lab:27019,mcfg01.tumo.lab:27019,mcfg02.tumo.lab:27019,mcfg03.tumo.lab:27019" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongos.log" } }
2016-02-01T10:28:59.452+0400 I SHARDING [mongosMain] Updating config server connection string to: mcfg/dj-cfg.tumo.lab:27019,mcfg01.tumo.lab:27019,mcfg02.tumo.lab:27019,mcfg03.tumo.lab:27019
2016-02-01T10:28:59.452+0400 I NETWORK  [mongosMain] Starting new replica set monitor for mcfg/dj-cfg.tumo.lab:27019,mcfg01.tumo.lab:27019,mcfg02.tumo.lab:27019,mcfg03.tumo.lab:27019
2016-02-01T10:28:59.452+0400 I NETWORK  [ReplicaSetMonitorWatcher] starting
2016-02-01T10:28:59.455+0400 I SHARDING [thread1] creating distributed lock ping thread for process dj-web02.tumo.lab:27017:1454308139:1283860503 (sleeping for 30000ms)
2016-02-01T10:28:59.512+0400 I NETWORK  [replSetDistLockPinger] changing hosts to mcfg/dj-cfg:27019,gm-cfg:27019,mcfg01:27019,mcfg02:27019,mcfg03:27019,sk-cfg:27019 from mcfg/dj-cfg.tumo.lab:27019,mcfg01.tumo.lab:27019,mcfg02.tumo.lab:27019,mcfg03.tumo.lab:27019
2016-02-01T10:28:59.512+0400 I SHARDING [replSetDistLockPinger] Updating config server connection string to: mcfg/dj-cfg:27019,gm-cfg:27019,mcfg01:27019,mcfg02:27019,mcfg03:27019,sk-cfg:27019
2016-02-01T10:28:59.626+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to mcfg01:27019
2016-02-01T10:28:59.629+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to mcfg01:27019
2016-02-01T10:28:59.650+0400 I NETWORK  [mongosMain] Starting new replica set monitor for central/central-mrs-master:27017,central-mrs-slave:27017,dj-central-mrs-slave:27017,gm-central-mrs-slave:27017,sk-central-mrs-slave:27017
2016-02-01T10:28:59.650+0400 I NETWORK  [mongosMain] Starting new replica set monitor for dilijan/dj-mrs-master.tumo.lab:27017,evn-dj-mrs-slave:27017
2016-02-01T10:28:59.651+0400 I NETWORK  [mongosMain] Starting new replica set monitor for gyumri/gm-mrs-master:27017,gm-mrs-slave:27017
2016-02-01T10:28:59.651+0400 I NETWORK  [mongosMain] Starting new replica set monitor for stepanakert/evn-sk-mrs-slave:27017,sk-mrs-master:27017
2016-02-01T10:28:59.651+0400 I NETWORK  [mongosMain] Starting new replica set monitor for yerevan/evn-mrs-master:27017,evn-mrs-slave:27017
2016-02-01T10:28:59.712+0400 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed findAndModify query predicate didn't match any lock document
2016-02-01T10:28:59.859+0400 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2016-02-01T10:28:59.859+0400 I SHARDING [Balancer] about to contact config servers and shards
2016-02-01T10:28:59.915+0400 I NETWORK  [mongosMain] waiting for connections on port 27017
2016-02-01T10:28:59.968+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to central-mrs-master:27017
2016-02-01T10:29:00.047+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to dj-mrs-master.tumo.lab:27017
2016-02-01T10:29:00.192+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to gm-mrs-master:27017
2016-02-01T10:29:00.276+0400 I NETWORK  [mongosMain] connection accepted from 10.44.20.221:33625 #1 (1 connection now open)
2016-02-01T10:29:00.437+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to sk-mrs-master:27017
2016-02-01T10:29:00.551+0400 I ASIO     [NetworkInterfaceASIO-ShardRegistry-0] Successfully connected to evn-mrs-master:27017
2016-02-01T10:29:00.555+0400 I SHARDING [Balancer] config servers and shards contacted successfully



 Comments   
Comment by Githook User [ 25/Mar/16 ]

Author:

{u'username': u'hanumantmk', u'name': u'Jason Carey', u'email': u'jcarey@argv.me'}

Message: SERVER-22391 Handle race in connpool hostTimeout

If a new request comes in at the same moment a hostTimeout triggers we
can race in shutdown (and hit an invariant). An additional check fixes
the misbehavior.

(cherry picked from commit c2b1e28fd15a4400cbb311057ef33a2120f1e6e2)
Branch: v3.2
https://github.com/mongodb/mongo/commit/9df2d7019d88632d98cdfb05c807881691114bff

Comment by Githook User [ 24/Mar/16 ]

Author:

{u'username': u'hanumantmk', u'name': u'Jason Carey', u'email': u'jcarey@argv.me'}

Message: SERVER-22391 Handle race in connpool hostTimeout

If a new request comes in at the same moment a hostTimeout triggers we
can race in shutdown (and hit an invariant). An additional check fixes
the misbehavior.
Branch: master
https://github.com/mongodb/mongo/commit/c2b1e28fd15a4400cbb311057ef33a2120f1e6e2

Comment by Mira Carey [ 23/Feb/16 ]

edikmkoyan, Sorry for the delayed response.

We're prioritizing this into our next development iteration, so look for updates and a fix in the next week or two.

Regards,
Jason

Comment by Edik Mkoyan [ 09/Feb/16 ]

it was just once

Comment by Mira Carey [ 02/Feb/16 ]

Edik,

That invariant failure lives on a codepath that gets run if a host hasn't seen a request in 5 minutes. It does look like the code permits a race, where we can be just about to drop the pool, enter the function to drop, then get a new request. And the invariant verifies that we don't have any requests.

Is this behavior something you've seen repeatedly, or just once? I suspect that the surrounding context would be a shard that gets a request exactly every 5 minutes or so.

Regards,
Jason

Generated at Thu Feb 08 04:00:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.