-
Type:
Bug
-
Resolution: Community Answered
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
ALL
-
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
CentOS Linux release 7.8.2003 (Core)
MongoD version 3.6.16
mongoS version 3.6.16
When restoring a shard from snapshots, mongoS cannot pull data consistently.
Shard info can be pulled via "sh.status" command via mongoS.
Running simple commands like "show collections" will hang and fail after the 30s timeout with "NetworkInterfaceExceededTimeLimit".
All ports can be reached via shards to configdb and configdb to shards.
Types of messages seen in one of the shard logs.
2020-09-02T15:11:28.624+0000 I NETWORK [shard registry reload] Marking host CONFIGDB_HOSTNAME:27020 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: Operation timed out 2020-09-02T15:11:28.624+0000 I SHARDING [shard registry reload] Operation timed out :: caused by :: NetworkInterfaceExceededTimeLimit: Operation timed out 2020-09-02T15:11:28.625+0000 I SHARDING [shard registry reload] Periodic reload of shard registry failed :: caused by :: NetworkInterfaceExceededTimeLimit: could not get updated shard list from config server due to Operation timed out; will retry after 30
Types of messages seen in configdb
2020-09-02T15:12:51.600+0000 I COMMAND [conn20665] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "databases", filter: { _id: "REDACTED" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1598990710, 1), t: 6 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1599059535, 1), signature: { hash: BinData(0, CBA1C41E88C09DB4E41C843D8F384811DF5ACA90), keyId: 6846969707174559784 } }, $configServerState: { opTime: { ts: Timestamp(1598990710, 1), t: 6 } }, $db: "config" }. Info: ExceededTimeLimit: Error waiting for snapshot not less than { ts: Timestamp(1598990710, 1), t: 6 }, current relevant optime is { ts: Timestamp(1599059563, 1), t: 5 }. :: caused by :: operation exceeded time limit
Sometimes when restarting the configdb once or twice it will fix the issue and mongoS will start pulling data again. The problem is this is very inconsistent and restarting the configdb only works sometimes to fix it.