[SERVER-50696] Shard restore procedure leaving cluster in hung or inconsistent state. Created: 02/Sep/20  Updated: 27/Oct/23  Resolved: 21/Sep/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Todd Vernick Assignee: Dmitry Agranat
Resolution: Community Answered Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

shard config

#where to log
logpath = /var/log/mongo/mongod.log
logappend = true
dbpath = /dbdata/db
profile = 1
# location of pidfile
pidfilepath = /var/run/mongodb/mongod.pid
storageEngine = wiredTiger
 
directoryperdb = False
port = 27018
bind_ip_all = True
shardsvr = true
replSet = analyticsshard1ReplSet-stage
wiredTigerEngineConfigString=session_max=40000
 
# MongoDB Security Options
transitionToAuth = true
keyFile = /var/lib/mongo/secrets/keyfile

configdb config

#where to log
logpath = /var/log/mongo/mongod.log
logappend = true
dbpath = /dbdata/db
profile = 1
# location of pidfile
pidfilepath = /var/run/mongodb/mongod.pid
storageEngine = wiredTiger
directoryperdb = False
port = 27020
bind_ip_all = True
configsvr = true
replSet = analyticsConfigdbReplSet-stage
wiredTigerEngineConfigString=session_max=40000
 
# MongoDB Security Options
transitionToAuth = true
keyFile = /var/lib/mongo/secrets/keyfile

  1. Stop mongoD on all shards and configdb
  2. Snapshot the data directory on original cluster (this was done with a GCP disk snapshot)
  3. Restore snapshots data directory to mountpoint
  4. Run through restore procedure via mongodb docs (v3.6)

Note: The original cluster has replicasets (3 each shard and 3 configdbs)

The restored cluster has 1 replica each in a PRIMARY state.

Participants:

 Description   

CentOS Linux release 7.8.2003 (Core)

MongoD version 3.6.16

mongoS version 3.6.16

When restoring a shard from snapshots, mongoS cannot pull data consistently.

Shard info can be pulled via "sh.status" command via mongoS.

Running simple commands like "show collections" will hang and fail after the 30s timeout with "NetworkInterfaceExceededTimeLimit".

All ports can be reached via shards to configdb and configdb to shards.

Types of messages seen in one of the shard logs.

2020-09-02T15:11:28.624+0000 I NETWORK  [shard registry reload] Marking host CONFIGDB_HOSTNAME:27020 as failed :: caused by :: NetworkInterfaceExceededTimeLimit: Operation timed out
2020-09-02T15:11:28.624+0000 I SHARDING [shard registry reload] Operation timed out  :: caused by :: NetworkInterfaceExceededTimeLimit: Operation timed out
2020-09-02T15:11:28.625+0000 I SHARDING [shard registry reload] Periodic reload of shard registry failed  :: caused by :: NetworkInterfaceExceededTimeLimit: could not get updated shard list from config server due to Operation timed out; will retry after 30

Types of messages seen in configdb

 2020-09-02T15:12:51.600+0000 I COMMAND  [conn20665] Command on database config timed out waiting for read concern to be satisfied. Command: { find: "databases", filter: { _id: "REDACTED" }, readConcern: { level: "majority", afterOpTime: { ts: Timestamp(1598990710, 1), t: 6 } }, maxTimeMS: 30000, $readPreference: { mode: "nearest" }, $replData: 1, $clusterTime: { clusterTime: Timestamp(1599059535, 1), signature: { hash: BinData(0, CBA1C41E88C09DB4E41C843D8F384811DF5ACA90), keyId: 6846969707174559784 } }, $configServerState: { opTime: { ts: Timestamp(1598990710, 1), t: 6 } }, $db: "config" }. Info: ExceededTimeLimit: Error waiting for snapshot not less than { ts: Timestamp(1598990710, 1), t: 6 }, current relevant optime is { ts: Timestamp(1599059563, 1), t: 5 }. :: caused by :: operation exceeded time limit

 

Sometimes when restarting the configdb once or twice it will fix the issue and mongoS will start pulling data again. The problem is this is very inconsistent and restarting the configdb only works sometimes to fix it.

 



 Comments   
Comment by Todd Vernick [ 23/Sep/20 ]

Hi @dagranat - Can we reopen this ticket? It's definitely not a network issue so I'd like to see if there is anything else we can explore.

Comment by Todd Vernick [ 21/Sep/20 ]

Also note I've gone through the community boards and there is nothing that has been helpful pertaining to this issue this far.

Comment by Todd Vernick [ 21/Sep/20 ]

The network error are during the restarts when the restored data is mounted to each shard. After everything restarts the network errors are not seen other than queries timing out as mentioned in my original issue. I've already ruled out network issues. All hosts can connect to respective ports and I've even ran tcpdumps to verify that nothing was being blocked.

Comment by Dmitry Agranat [ 21/Sep/20 ]

Hi tvernick@squarespace.com, thank you for uploading all the requested information.

I've noticed multiple errors which might indicate an issue with your network connectivity.

Cannot reach any nodes for set analyticsshard1ReplSet-stage. Please check network connectivity and the status of the set. This has happened for 1341 checks in a row

In addition, this error (in addition to above mentioned reason of network connectivity) might also indicate that the balancer was not fully stopped during the backup process:

failed to refresh mongos settings :: caused by :: NetworkInterfaceExceededTimeLimit: Failed to refresh the balancer settings due to NetworkInterfaceExceededTimeLimit: Operation timed out
Failed to connect to <IP>, in(checking socket for error after poll), reason: Connection refused

If you need further assistance troubleshooting, I encourage you to ask our community by posting on the MongoDB Developer Community Forums or on Stack Overflow with the mongodb tag.

Regards,
Dima

Comment by Todd Vernick [ 14/Sep/20 ]

Hi Dmitry - I have uploaded the files you requested. Please let me know if you need anything else.

Comment by Dmitry Agranat [ 14/Sep/20 ]

Hi tvernick@squarespace.com, we'll need to gather some data in order to understand what's going on.

  • All the steps and commands performed to backup the cluster according to this procedure
  • All the steps and commands performed to restore the cluster according to this procedure

You can combine all the executed commands and command's output per file. For example, backup_file.txt and restore_file.txt

  • After the cluster is restored, upload the mongod logs from all config, mongoS and mongoD servers. In addition, upload the config dump:

    mongodump --host <configServerHost> --port <configServerPort>
    tar -czf ./config-dump-$(date +%s).tar.gz ./dump
    

Note: please change <configServerHost> and <configServerPort> to any config server.
Note: if you are using authentication then you will need to add authentication to the mongodump command line.

I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.

Thanks,
Dima

Comment by Todd Vernick [ 13/Sep/20 ]

Yes I have tried this command but unfortunately it does not help.

Comment by Dmitry Agranat [ 13/Sep/20 ]

Hi tvernick@squarespace.com, before requesting all the needed data to investigate this, could you please clarify if you have tried flushRouterConfig from my last comment?

Comment by Todd Vernick [ 10/Sep/20 ]

Hi Dmitry - I have setup the restored cluster with the additional replicas like the original cluster setup but I'm still seeing the same issue with command timeouts.

Comment by Todd Vernick [ 09/Sep/20 ]

I'm going to try to get it more of a mirror of the original. Would there potentially be an issue if a sharded cluster only has a single replica per shard instead of 3 members?

Comment by Dmitry Agranat [ 06/Sep/20 ]

Hi tvernick@squarespace.com,

Thank you for providing all the requested information, it was very helpful.

Based on the above, it appears that you are trying to do a partial restore of your original cluster. Custom restore procedures are out of scope for the SERVER project. In case the restored cluster is identical to the original one (or has more members as documented here) and you still experience the reported issues, we'd happy to take a look.

One thing that might help with mongoS inconsistency (although this would entirely depend of the specific steps in this custom procedure) is to clear the cached routing table with flushRouterConfig.

Thanks,
Dima

Comment by Todd Vernick [ 02/Sep/20 ]

Also this is config options running on mongoS

mongos --bind_ip 0.0.0.0 --port 27017 --serviceExecutor adaptive --transitionToAuth --keyFile /etc/cluster-secrets/keyfile --configdb analyticsConfigdbReplSet-stage/HOSTNAME_REMOVED:27020 --setParameter ShardingTaskExecutorPoolMaxSize=8 --setParameter ShardingTaskExecutorPoolMinSize=0 --setParameter taskExecutorPoolSize=8 

Comment by Todd Vernick [ 02/Sep/20 ]

Hi Dmitry-

I followed the doc here https://docs.mongodb.com/v3.6/tutorial/restore-sharded-cluster/

Note this actually is happening on two different clusters - Similar data sets but two environments (production and staging).

Staging has just a fraction of the total data stored and 3 less shards. I'll just use staging for example here.

Original cluster:

3 shards with 6 replicaset members in each shard.

3 additional hidden secondaries are hosted in GCP used primarily as the "snapshot" source data members. (Note these hosts are in sync so no lag here)

1 configdb replicaset with 6 members.

mongoS clients connecting to the 1 configdb member 

 

Restore cluster:

3 shards with 1 primary replica on each shard.

1 configdb replicaset with 1 member.

mongoS clients connecting to the 1 configdb member 

Original configdb server is not touched during any part of the process besides stopping the mongoD process before snapshotting the data directory.

Comment by Dmitry Agranat [ 02/Sep/20 ]

Hi tvernick@squarespace.com, I'd like to clarify a couple of points:

  • Can you link the procedure you've used to restore a Sharded Cluster?
  • Can you provide a detailed topology of the cluster you back up and the cluster after restore?
  • Did you perform any direct manipulation on config servers during the restore?

Thanks,
Dima

Generated at Thu Feb 08 05:23:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.