Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Community Answered
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Operating System:
ALL
Steps To Reproduce:
Hide

shard config

#where to log logpath = /var/log/mongo/mongod.log logappend = true dbpath = /dbdata/db profile = 1 # location of pidfile pidfilepath = /var/run/mongodb/mongod.pid storageEngine = wiredTiger directoryperdb = False port = 27018 bind_ip_all = True shardsvr = true replSet = analyticsshard1ReplSet-stage wiredTigerEngineConfigString=session_max=40000 # MongoDB Security Options transitionToAuth = true keyFile = /var/lib/mongo/secrets/keyfile

configdb config

#where to log logpath = /var/log/mongo/mongod.log logappend = true dbpath = /dbdata/db profile = 1 # location of pidfile pidfilepath = /var/run/mongodb/mongod.pid storageEngine = wiredTiger directoryperdb = False port = 27020 bind_ip_all = True configsvr = true replSet = analyticsConfigdbReplSet-stage wiredTigerEngineConfigString=session_max=40000 # MongoDB Security Options transitionToAuth = true keyFile = /var/lib/mongo/secrets/keyfile

Stop mongoD on all shards and configdb

Snapshot the data directory on original cluster (this was done with a GCP disk snapshot)

Restore snapshots data directory to mountpoint

Run through restore procedure via mongodb docs (v3.6)

Note: The original cluster has replicasets (3 each shard and 3 configdbs)

The restored cluster has 1 replica each in a PRIMARY state.
Show
shard config #where to log logpath = / var /log/mongo/mongod.log logappend = true dbpath = /dbdata/db profile = 1 # location of pidfile pidfilepath = / var /run/mongodb/mongod.pid storageEngine = wiredTiger directoryperdb = False port = 27018 bind_ip_all = True shardsvr = true replSet = analyticsshard1ReplSet-stage wiredTigerEngineConfigString=session_max=40000 # MongoDB Security Options transitionToAuth = true keyFile = / var /lib/mongo/secrets/keyfile configdb config #where to log logpath = / var /log/mongo/mongod.log logappend = true dbpath = /dbdata/db profile = 1 # location of pidfile pidfilepath = / var /run/mongodb/mongod.pid storageEngine = wiredTiger directoryperdb = False port = 27020 bind_ip_all = True configsvr = true replSet = analyticsConfigdbReplSet-stage wiredTigerEngineConfigString=session_max=40000 # MongoDB Security Options transitionToAuth = true keyFile = / var /lib/mongo/secrets/keyfile Stop mongoD on all shards and configdb Snapshot the data directory on original cluster (this was done with a GCP disk snapshot) Restore snapshots data directory to mountpoint Run through restore procedure via mongodb docs (v3.6) Note: The original cluster has replicasets (3 each shard and 3 configdbs) The restored cluster has 1 replica each in a PRIMARY state.
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

CentOS Linux release 7.8.2003 (Core)

MongoD version 3.6.16

mongoS version 3.6.16

When restoring a shard from snapshots, mongoS cannot pull data consistently.

Shard info can be pulled via "sh.status" command via mongoS.

Running simple commands like "show collections" will hang and fail after the 30s timeout with "NetworkInterfaceExceededTimeLimit".

All ports can be reached via shards to configdb and configdb to shards.

Types of messages seen in one of the shard logs.

2020-09-02T15:11:28.624+0000 I NETWORK  [shard registry reload] Marking host CONFIGDB_HOSTNAME:27020 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: Operation timed out
2020-09-02T15:11:28.624+0000 I SHARDING [shard registry reload] Operation timed out  :: caused by :: NetworkInterfaceExceededTimeLimit: Operation timed out
2020-09-02T15:11:28.625+0000 I SHARDING [shard registry reload] Periodic reload of shard registry failed  :: caused by :: NetworkInterfaceExceededTimeLimit: could not get updated shard list from config server due to Operation timed out; will retry after 30

Types of messages seen in configdb

 2020-09-02T15:12:51.600+0000 I COMMAND  [conn20665] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "databases", filter: { _id: "REDACTED" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1598990710, 1), t: 6 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1599059535, 1), signature: { hash: BinData(0, CBA1C41E88C09DB4E41C843D8F384811DF5ACA90), keyId: 6846969707174559784 } }, $configServerState: { opTime: { ts: Timestamp(1598990710, 1), t: 6 } }, $db: "config" }. Info: ExceededTimeLimit: Error waiting for snapshot not less than { ts: Timestamp(1598990710, 1), t: 6 }, current relevant optime is { ts: Timestamp(1599059563, 1), t: 5 }. :: caused by :: operation exceeded time limit

Sometimes when restarting the configdb once or twice it will fix the issue and mongoS will start pulling data again. The problem is this is very inconsistent and restarting the configdb only works sometimes to fix it.

Assignee:: Dmitry Agranat
Reporter:: Todd Vernick
Participants:: Dmitry Agranat, Todd Vernick
Votes:: 0 Vote for this issue
Watchers:: 7 Start watching this issue

Created:: Sep 02 2020 03:23:11 PM UTC
Updated:: Oct 27 2023 03:56:30 PM UTC
Resolved:: Sep 21 2020 11:07:42 AM UTC

Details

Description

Attachments

Activity

People

Dates