[SERVER-26373] Operation timed out between config servers and mongoS Created: 28/Sep/16  Updated: 11/Dec/17  Resolved: 19/Nov/16

Status: Closed
Project: Core Server
Component/s: Networking, Sharding
Affects Version/s: 3.2.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Thiago Leite Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

MongoDB shell version: 3.2.8
3 Config servers -> port 27020
3 Shards each with 3 replicas
21 mongoS + 1 mongoS local in each shard

Note: I have the same setup with small numbers of mongoS (3 mongoS), but all instances in the same server and is working well.

Note 2: There is not any firewall between shards, configs and mongoS.

Sprint: Sharding 2016-10-10, Sharding 2016-10-31
Participants:

 Description   

Hi All!

I was in process of change my config servers from mmap/3.0 mode to wiredTiger/3.2 mode(replicaset) and after I finished the processs I started
to redirect all mongoS to the new servers and now all MongoS can't comunicate (30 sec timeout) with the new config servers. The cluster is still working, but
I can't run any commands in config database from mongoS.

TIMEOUT REPORTED IN MONGOS:

Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Operation 1089 timed out.
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Failed to execute command: RemoteCommand 1089 -- target:config03.xx.xxxx.com:27020 db:config expDate:2016-09-28T17:36:30.066+0000 cmd:{ find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 } reason: ExceededTimeLimit: Operation timed out
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Received remote response: ExceededTimeLimit: Operation timed out
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] Marking host config03.xx.xxxx.com:27020 as failed
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] User Assertion: 50:could not get updated shard list from config server due to Operation timed out
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] DBException thrown :: caused by :: 50 could not get updated shard list from config server due to Operation timed out
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer]   0xc71ed2 0xc71dd3 0xbf7827 0x69daf6 0xbf7a6c 0xbf7b0c 0xb2d413 0xa6070b 0xbfbed0 0xea8da0 0x361ea07851 0x361e6e894d ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"400000","o":"871ED2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"871DD3","s":"_ZN5mongo15printStackTraceEv"},{"b":"400000","o":"7F7827","s":"_ZN5mongo11DBException13traceIfNeededERKS0_"},{"b":"400000","o":"29DAF6","s":"_ZN5mongo11DBExceptionC1ERKSsi"},{"b":"400000","o":"7F7A6C","s":"_ZN5mongo9uassertedEiPKc"},{"b":"400000","o":"7F7B0C"},{"b":"400000","o":"72D413","s":"_ZN5mongo13ShardRegistry6reloadEPNS_16OperationContextE"},{"b":"400000","o":"66070B","s":"_ZN5mongo8Balancer3runEv"},{"b":"400000","o":"7FBED0","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},{"b":"400000","o":"AA8DA0","s":"execute_native_thread_routine"},{"b":"361EA00000","o":"7851"},{"b":"361E600000","o":"E894D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.8", "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-358.23.2.el6.x86_64", "version" : "#1 SMP Wed Oct 16 18:37:12 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F21AE95924560F6FA79AF3D2316609D992A6E7FB" }, { "b" : "7FFF077A7000", "elfType" : 3, "buildId" : "4D392D7A6140FA0AFF2F9098276ED6E94D137826" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "B06F7B61A75BD941A6D9E36B2DC1CDCB4183D706" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "0E46D8ED406D53C9A553C20859CD4679928AE7C0" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "756DBE5D1255F42B13E0659E3DD791D34A91465A" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "A38407EE35545AEA5CF08FE4CAA8B66E5909B6F3" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "1686CCDAE5F8CED5A251E40074F55EFDF1688B75" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A2E6E550A824EBC44AE5487B290A00923DB37761" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "3FF31EFC5E0E5CFC4BFDAE19F3DE3AD55DA766CD" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "66B744D7D3B8201145C2C40E7A201F61B73E77D0" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "DE7D207393D303AF233E6AD4D1E8A8314843422A" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "EFF68B7DE77D081BC4A0CB38FE9DCBC60541BF92" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "95EBB74C2C0A1E1714344036145A0239FFA4892D" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6ADE12F76961F73B33D160AC4D342222E7FC7A65" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "D02E7D3149950118009A81997434E28B7D9EC9B2" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "5AFCBEA0D62EE0335714CCBAB7BA808E2A16028C" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6309B69A475D35D4E93D31DB3A8DDAF5100075C8" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}  mongos(_ZN5mongo15printStackTraceERSo+0x32) [0xc71ed2]  mongos(_ZN5mongo15printStackTraceEv+0x63) [0xc71dd3]  mongos(_ZN5mongo11DBException13traceIfNeededERKS0_+0x117) [0xbf7827]  mongos(_ZN5mongo11DBExceptionC1ERKSsi+0x46) [0x69daf6]  mongos(_ZN5mongo9uassertedEiPKc+0x10C) [0xbf7a6c]  mongos(+0x7F7B0C) [0xbf7b0c]  mongos(_ZN5mongo13ShardRegistry6reloadEPNS_16OperationContextE+0x1F3) [0xb2d413]  mongos(_ZN5mongo8Balancer3runEv+0x1FB) [0xa6070b]  mongos(_ZN5mongo13BackgroundJob7jobBodyEv+0x160) [0xbfbed0]  mongos(execute_native_thread_routine+0x20) [0xea8da0]  libpthread.so.0(+0x7851) [0x361ea07851]  libc.so.6(clone+0x6D) [0x361e6e894d] -----  END BACKTRACE  -----
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] caught exception while doing balance: could not get updated shard list from config server due to Operation timed out
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] *** End of balancing round
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] about to log metadata event into actionlog: { _id: "shard01.xx.xxxx.com-2016-09-28T17:36:30.070+0000-57ebff9ec41485f9c148337f", server: "shard01.xx.xxxx.com", clientAddr: "", time: new Date(1475084190070), what: "balancer.round", ns: "", details: { executionTimeMillis: 30007, errorOccured: true, errmsg: "could not get updated shard list from config server due to Operation timed out" } }
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] Scheduling remote command request: RemoteCommand 1100 -- target:config01.xx.xxxx.com:27020 db:config expDate:2016-09-28T17:37:00.070+0000 cmd:{ insert: "actionlog", documents: [ { _id: "shard01.xx.xxxx.com-2016-09-28T17:36:30.070+0000-57ebff9ec41485f9c148337f", server: "shard01.xx.xxxx.com", clientAddr: "", time: new Date(1475084190070), what: "balancer.round", ns: "", details: { executionTimeMillis: 30007, errorOccured: true, errmsg: "could not get updated shard list from config server due to Operation timed out" } } ], writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 }
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [Balancer] startCommand: RemoteCommand 1100 -- target:config01.xx.xxxx.com:27020 db:config expDate:2016-09-28T17:37:00.070+0000 cmd:{ insert: "actionlog", documents: [ { _id: "shard01.xx.xxxx.com-2016-09-28T17:36:30.070+0000-57ebff9ec41485f9c148337f", server: "shard01.xx.xxxx.com", clientAddr: "", time: new Date(1475084190070), what: "balancer.round", ns: "", details: { executionTimeMillis: 30007, errorOccured: true, errmsg: "could not get updated shard list from config server due to Operation timed out" } } ], writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 }
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Initiating asynchronous command: RemoteCommand 1100 -- target:config01.xx.xxxx.com:27020 db:config expDate:2016-09-28T17:37:00.070+0000 cmd:{ insert: "actionlog", documents: [ { _id: "shard01.xx.xxxx.com-2016-09-28T17:36:30.070+0000-57ebff9ec41485f9c148337f", server: "shard01.xx.xxxx.com", clientAddr: "", time: new Date(1475084190070), what: "balancer.round", ns: "", details: { executionTimeMillis: 30007, errorOccured: true, errmsg: "could not get updated shard list from config server due to Operation timed out" } } ], writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 }
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Starting asynchronous command 1100 on host config01.xx.xxxx.com:27020
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Received remote response: RemoteResponse --  cmd:{ ok: 1, n: 1, opTime: { ts: Timestamp 1475084190000|5, t: 2 }, electionId: ObjectId('7fffffff0000000000000002') }
Sep 28 17:36:30 shard01.xx.xxxx.com mongos.27017[19179]: [NetworkInterfaceASIO-ShardRegistry-0] Failed to time operation 1100 out: Operation aborted.

CONFIG SERVERS TIMEOUT:

Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [NetworkInterfaceASIO-BGSync-0] Initiating asynchronous command: RemoteCommand 4821 -- target:config02.xx.xxxx.com:27020 db:local expDate:2016-09-28T17:26:33.226+0000 cmd:{ getMore: 25514723601, collection: "oplog.rs", maxTimeMS: 5000, term: 2, lastKnownCommittedOpTime: { ts: Timestamp 1475083583000|1, t: 2 } }
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [NetworkInterfaceASIO-BGSync-0] Starting asynchronous command 4821 on host config02.xx.xxxx.com:27020
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 }
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn167] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 }
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165] command config.$cmd command: find { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:92 locks:{} protocol:op_command 30367ms
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn167] command config.$cmd command: find { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:92 locks:{} protocol:op_command 30360ms
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165] Socket recv() conn closed? 10.235.138.170:38223
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn167] Socket recv() conn closed? 10.235.137.51:27374
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165] DBException thrown :: caused by :: 9001 socket exception [CLOSED] for 10.235.138.170:38223
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn167] DBException thrown :: caused by :: 9001 socket exception [CLOSED] for 10.235.137.51:27374
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165]   0x133aba2 0x133aaa3 0x12c1cb7 0x12eea1f 0x12ef7ab 0x12ef7c1 0x12ef81d 0x12e3681 0x12e57d7 0x35fc6079d1 0x35fbee88fd ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"400000","o":"F3ABA2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F3AAA3","s":"_ZN5mongo15printStackTraceEv"},{"b":"400000","o":"EC1CB7","s":"_ZN5mongo11DBException13traceIfNeededERKS0_"},{"b":"400000","o":"EEEA1F","s":"_ZN5mongo6Socket15handleRecvErrorEii"},{"b":"400000","o":"EEF7AB","s":"_ZN5mongo6Socket5_recvEPci"},{"b":"400000","o":"EEF7C1","s":"_ZN5mongo6Socket11unsafe_recvEPci"},{"b":"400000","o":"EEF81D","s":"_ZN5mongo6Socket4recvEPci"},{"b":"400000","o":"EE3681","s":"_ZN5mongo13MessagingPort4recvERNS_7MessageE"},{"b":"400000","o":"EE57D7","s":"_ZN5mongo17PortMessageServer17handleIncomingMsgEPv"},{"b":"35FC600000","o":"79D1"},{"b":"35FBE00000","o":"E88FD","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.8", "gitVersion" : "ed70e33130c977bda0024c125b56d159573dbaf0", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-358.el6.x86_64", "version" : "#1 SMP Fri Feb 22 00:31:26 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "4D5F33E10977D0D4B27FC98AA0326829E78A19A4" }, { "b" : "7FFFE18FF000", "elfType" : 3, "buildId" : "B5D86FBCF0CCB03331E6C7C73897B96845E0A4EB" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "934508308DAF0D5C61E9997463F0D8B0A3F096BA" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "A4329A30669C783FA8DEEB7D1EA83749A8FA14E1" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "583411D8786F86A1D6B8741C502831E6122445A7" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "454F8FC6CC6502C6401E5F9E221564D80665D277" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "C9A87F6A29ED1D3CB18F539845A45FE3A9877FF1" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A2E6E550A824EBC44AE5487B290A00923DB37761" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "B8DFF8E53D9F2B80C3C382E83EC17C828B536A39" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "6EFE254F4564519BBB80889534FAC3D61C18C387" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F8E59B70E469F3A924A268911FF8FD0C37E7460" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "76A3DEEB6876CBED69A57D3EBC1E2AFBCA84EC76" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "605701A8AE551604303523B4F0D3A7E98CF9E153" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "F1A67CF54F08AFBEE05E316BAFD9EF168F258800" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "190D45F6743DEF9DF8169D65801D4575B01825BD" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "DAE2A7E4E8B37D43EF6839FF5D8E012AFCF21A69" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "58B696478044E028A5970D48A4ED50E164B43B36" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}  mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x133aba2]  mongod(_ZN5mongo15printStackTraceEv+0x63) [0x133aaa3]  mongod(_ZN5mongo11DBException13traceIfNeededERKS0_+0x117) [0x12c1cb7]  mongod(_ZN5mongo6Socket15handleRecvErrorEii+0xA0F) [0x12eea1f]  mongod(_ZN5mongo6Socket5_recvEPci+0x6B) [0x12ef7ab]  mongod(_ZN5mongo6Socket11unsafe_recvEPci+0x11) [0x12ef7c1]  mongod(_ZN5mongo6Socket4recvEPci+0x3D) [0x12ef81d]  mongod(_ZN5mongo13MessagingPort4recvERNS_7MessageE+0x51) [0x12e3681]  mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x307) [0x12e57d7]  libpthread.so.0(+0x79D1) [0x35fc6079d1]  libc.so.6(clone+0x6D) [0x35fbee88fd] -----  END BACKTRACE  -----
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165] SocketException: remote: 10.235.138.170:38223 error: 9001 socket exception [CLOSED] server [10.235.138.170:38223]
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn165] end connection 10.235.138.170:38223 (70 connections now open)
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn166] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 }
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn166] command config.$cmd command: find { find: "shards", readConcern: { level: "majority", afterOpTime: { ts: Timestamp 1473441131000|1, t: 3 } }, maxTimeMS: 30000 } keyUpdates:0 writeConflicts:0 numYields:0 reslen:92 locks:{} protocol:op_command 30370ms
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn166] Socket recv() conn closed? 10.235.137.54:21067
Sep 28 17:26:23 config03.xx.xxxx.com mongod.27020[21051]: [conn166] DBException thrown :: caused by :: 9001 socket exception [CLOSED] for 10.235.137.54:21067

MONGO SHELL -> MONGOS -> TIMEOUT:

mongos> sh.getBalancerState();
2016-09-28T17:47:24.331+0000 E QUERY    [thread1] Error: error: { "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" } :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DBCommandCursor@src/mongo/shell/query.js:689:1
DBQuery.prototype._exec@src/mongo/shell/query.js:118:28
DBQuery.prototype.hasNext@src/mongo/shell/query.js:276:5
DBCollection.prototype.findOne@src/mongo/shell/collection.js:289:10
sh.getBalancerState@src/mongo/shell/utils_sh.js:128:13
@(shell):1:1
 
mongos> sh.status()
2016-09-28T17:48:00.801+0000 E QUERY    [thread1] Error: error: { "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" } :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DBCommandCursor@src/mongo/shell/query.js:689:1
DBQuery.prototype._exec@src/mongo/shell/query.js:118:28
DBQuery.prototype.hasNext@src/mongo/shell/query.js:276:5
DBCollection.prototype.findOne@src/mongo/shell/collection.js:289:10
printShardingStatus@src/mongo/shell/utils_sh.js:540:19
sh.status@src/mongo/shell/utils_sh.js:78:5
@(shell):1:1
 
mongos> db.shards.find({}).readConcern('majority');
Error: error: { "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" }
 
mongos> show collections
2016-09-28T18:12:22.827+0000 E QUERY    [thread1] Error: listCollections failed: { "code" : 50, "ok" : 0, "errmsg" : "Operation timed out" } :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DB.prototype._getCollectionInfosCommand@src/mongo/shell/db.js:773:1
DB.prototype.getCollectionInfos@src/mongo/shell/db.js:785:19
DB.prototype.getCollectionNames@src/mongo/shell/db.js:796:16
shellHelper.show@src/mongo/shell/utils.js:754:9
shellHelper@src/mongo/shell/utils.js:651:15
@(shellhelp2):1:1

MONGO SHELL -> CONFIG SERVERS - OK!:

csReplSet:SECONDARY> db.shards.find({}).readConcern('majority');
{ "_id" : "portalshard_rs1", "host" : "portalshard_rs1/shard01.xx.xxxx.com:27018,shard04.xx.xxxx.com:27018,shard07.xx.xxxx.com:27018" }
{ "_id" : "portalshard_rs3", "host" : "portalshard_rs3/shard03.xx.xxxx.com:27018,shard06.xx.xxxx.com:27018,shard09.xx.xxxx.com:27018" }
{ "_id" : "portalshard_rs2", "host" : "portalshard_rs2/shard02.xx.xxxx.com:27018,shard05.xx.xxxx.com:27018,shard08.xx.xxxx.com:27018" }

CHECK CONNECTION BETWEEN MONGOS AND CONFIG SERVERS:

[root@shard01.xx.xxxx.com ~]# nc -v config01.xx.xxxx.com 27020
Connection to config01.xx.xxxx.com 27020 port [tcp/*] succeeded!
^C
[root@shard01.xx.xxxx.com ~]# nc -v config02.xx.xxxx.com 27020
Connection to config02.xx.xxxx.com 27020 port [tcp/*] succeeded!
^C
[root@shard01.xx.xxxx.com ~]# nc -v config03.xx.xxxx.com 27020
Connection to config03.xx.xxxx.com 27020 port [tcp/*] succeeded!
 
[root@config03.xx.xxxx.com ~]# nc -v shard01.xx.xxxx.com 27017
Connection to shard01.xx.xxxx.com 27017 port [tcp/*] succeeded!
^C

CONFIG SERVER CONFIGURATION:

[root@config03.xx.xxxx.com ~]# cat /etc/mongod/mongod_configsvr_new.conf
net:
  http:
    RESTInterfaceEnabled: true
    enabled: true
  port: 27020
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongod_configdb_new.pid
security:
  clusterAuthMode: keyFile
  keyFile: /etc/mongod/keyfile
sharding:
  clusterRole: configsvr
storage:
  dbPath: /data/configdb_new
  journal:
    enabled: false
  engine: wiredTiger
  wiredTiger:
    engineConfig:
      cacheSizeGB: 1
      journalCompressor: snappy
      directoryForIndexes: true
    collectionConfig:
      blockCompressor: snappy
    indexConfig:
      prefixCompression: true
systemLog:
  destination: syslog
  logAppend: true
  logRotate: rename
replication:
  replSetName: csReplSet

MONGO VERSION (all server run the same version):

[root@config03.xx.xxxx.com ~]# mongo --version
MongoDB shell version: 3.2.8

MONGOS SERVER CONFIGURATION:

[root@shard01.xx.xxxx.com ~]# cat /etc/mongos/mongos.conf
 
net:
  port: 27017
processManagement:
  fork: true
  pidFilePath: /var/run/mongodb/mongos.pid
security:
  clusterAuthMode: keyFile
  keyFile: /etc/mongod/keyfile
sharding:
  configDB: "csReplSet/config01.xx.xxxx.com:27020,config02.xx.xxxx.com:27020,config03.xx.xxxx.com:27020"
systemLog:
  destination: syslog
  logAppend: true
  logRotate: rename

Any suggestions?

Regards,

Thiago Leite



 Comments   
Comment by Abhishek [ 10/Dec/17 ]

fyi, this happened to me and after hours of debugging I found that my config server was started without the configsvr: true option in rs.initiate. So mongos was requesting data from my config server but the config server didn't know how to respond. fwiw, I had

sharding: 
    clusterRole: configsvr

in my conf file but looks like that wasn't picked up.

Comment by Kelsey Schubert [ 19/Nov/16 ]

Hi thiagosantosleite@yahoo.com.br,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Regards,
Thomas

Comment by Kelsey Schubert [ 26/Oct/16 ]

Hi thiagosantosleite@yahoo.com.br,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the logs I requested?

Thank you,
Thomas

Comment by Kelsey Schubert [ 30/Sep/16 ]

Hi thiagosantosleite@yahoo.com.br,

Thank you for the detailed report. We've begun investigating this issue, and we need additional details to better understand what is going on here.

If you have the logs starting before the upgrade process, would you please upload the complete logs of the following?

  • All three config servers
  • The primary of each shard
  • The log of one affected mongos

Additionally, if it isn't too much trouble, the logs of the secondaries may also contain helpful information.

I've created a secure upload portal for you to use to provide the logs. Files uploaded to this portal are only visible to MongoDB employees investigating this issue and are routinely deleted after some time.

Additionally, would you please list the steps in your upgrade process to switch your config servers from Sync Cluster Connection Config (SCCC) to Config Servers as a Replica Set (CSRS)?

Thanks again,
Thomas

Generated at Thu Feb 08 04:11:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.