-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
Server Programmability
This ticket requests improvement in mongos's response to clients when mongos cannot reach a necessary shard for an operation (such as if mongod maxIncomingConnections is reached).
Specifically, is it possible to identify in the failure response what server could not be reached as well as what server failed to reach it? This would help ensure users are not attempting to diagnose connectivity issues between clients and routers when the issue is between routers and shard members (or in rare cases, between two mongods)
The mongos logs for such a failure look like:
{"t":{"$date":"2021-07-12T13:30:10.300-07:00"},"s":"I", "c":"NETWORK", "id":51800, "ctx":"conn526","msg":"client metadata","attr":{"remote":"127.0.0.1:62159","client":"conn526","doc":{"driver":{"name":"PyMongo","version":"3.12.0b0"},"os":{"type":"Darwin","name":"Darwin","architecture":"x86_64","version":"10.16"},"platform":"CPython 3.7.2.final.0","mongos":{"host":"nodachi:27017","client":"127.0.0.1:62159","version":"4.4.7-rc1"}}}} {"t":{"$date":"2021-07-12T13:30:10.322-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:10.329-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:14.383-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:14.397-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:19.557-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:19.574-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:24.391-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:24.402-07:00"},"s":"I", "c":"NETWORK", "id":4712102, "ctx":"conn526","msg":"Host failed in replica set","attr":{"replicaSet":"shard01","host":"localhost:27018","error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"action":{"dropConnections":true,"requestImmediateCheck":false,"outcome":{"host":"localhost:27018","success":false,"errorMessage":"HostUnreachable: Connection closed by peer"}}}} {"t":{"$date":"2021-07-12T13:30:24.414-07:00"},"s":"I", "c":"QUERY", "id":4625501, "ctx":"conn526","msg":"Unable to establish remote cursors","attr":{"error":{"code":6,"codeName":"HostUnreachable","errmsg":"Connection closed by peer"},"nRemotes":3}} {"t":{"$date":"2021-07-12T13:30:24.422-07:00"},"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn526","msg":"Slow query","attr":{"type":"command","ns":"test.test","command":{"aggregate":"test","pipeline":[{"$merge":{"into":"test2"}}],"cursor":{},"lsid":{"id":{"$uuid":"8e633a66-15fe-4c5e-b2b2-34c148a76b41"}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1626121804,"i":1}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"$db":"test","$readPreference":{"mode":"primary"}},"numYields":0,"ok":0,"errMsg":"Connection closed by peer","errName":"HostUnreachable","errCode":6,"reslen":241,"protocol":"op_msg","durationMillis":14113}} {"t":{"$date":"2021-07-12T13:31:42.821-07:00"},"s":"I", "c":"NETWORK", "id":22944, "ctx":"conn526","msg":"Connection ended","attr":{"remote":"127.0.0.1:62159","connectionId":526,"connectionCount":5}}
While not very concise, it is possible to understand from the mongos logs that the mongos failed to reach localhost:27018.
However, the response to the client is less scrutable:
{"ok": 0.0, "errmsg": "Connection closed by peer", "code": 6, "codeName": "HostUnreachable", "operationTime": {"$timestamp": {"t": 1626121818, "i": 1}}, "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1626121838, "i": 1}}, "signature": {"hash": {"$binary": "AAAAAAAAAAAAAAAAAAAAAAAAAAA=", "$type": "00"}, "keyId": 0}}}
and the shell's response is:
mongos> db.test.aggregate([{$merge:{into:"test2"}}]) uncaught exception: Error: command failed: { "ok" : 0, "errmsg" : "Connection closed by peer", "code" : 6, "codeName" : "HostUnreachable", "operationTime" : Timestamp(1626114452, 1), "$clusterTime" : { "clusterTime" : Timestamp(1626114488, 1), "signature" : { "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId" : NumberLong(0) } } } : aggregate failed : _getErrorWithCode@src/mongo/shell/utils.js:25:13 doassert@src/mongo/shell/assert.js:18:14 _assertCommandWorked@src/mongo/shell/assert.js:665:17 assert.commandWorked@src/mongo/shell/assert.js:755:16 DB.prototype._runAggregate@src/mongo/shell/db.js:266:5 DBCollection.prototype.aggregate@src/mongo/shell/collection.js:1058:12 @(shell):1:1
- related to
-
SERVER-60063 Log server discovery times
- Closed