[SERVER-60916] CPS Restores failed with a snapshot with documents in reshardingOperation Created: 22/Oct/21  Updated: 29/Oct/23  Resolved: 28/Oct/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.2.0, 5.0.4, 5.1.0-rc3

Type: Bug Priority: Major - P3
Reporter: Chuck Zhang Assignee: Max Hirschhorn
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File Restore status.png     Text File atlas-4wnh1p-config-00-01.qzt8l.mmscloudteam.com_2021-10-22T00_00_00_2021-10-22T03_30_00_mongodb.log     Text File atlas-4wnh1p-config-00-02.qzt8l.mmscloudteam.com_2021-10-22T00_00_00_2021-10-22T03_30_00_mongodb.log     Text File automation-agent-verbose.log    
Issue Links:
Backports
Related
is related to SERVER-57280 ShardRegistry must be initialized bef... Closed
is related to SERVER-57519 Make ARS use causally consistent Shar... Closed
is related to SERVER-61003 ReadConcernMajorityNotAvailableYet er... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Sharding 2021-11-01
Participants:
Story Points: 1

 Description   

Manual tested below:

  1. Created a 5.0 sharded cluster 
  2. Create an index (_id: "hashed")
  3. Run sh.reshardCollection("data.random", {"_id": "hashed"} ) with the above index, and take a snapshot after 1 min.
  4. Restore the above snapshot to a target 5.0 cluster @ 10/22/21 - 12:24 AM 
  5. Restore failed 

 

reshardCollection

 

Enterprise [mongos] data> sh.reshardCollection("data.random", {"_id": "hashed"} )
{
  ok: 1,
  '$clusterTime': {
    clusterTime: Timestamp({ t: 1634858341, i: 56 }),
    signature: {
      hash: Binary(Buffer.from("dab217339e8ecc2cd931ac36643be7522b9aa563", "hex"), 0),
      keyId: Long("7021608605052829705")
    }
  },
  operationTime: Timestamp({ t: 1634858341, i: 52 })
}
Enterprise [mongos] data> db.random.getShardDistribution()
Shard atlas-b7bb9t-shard-1 at atlas-b7bb9t-shard-1/atlas-b7bb9t-shard-01-00.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-01-01.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-01-02.qzt8l.mmscloudteam.com:27017
{
  data: '76.04MiB',
  docs: 1216,
  chunks: 3,
  'estimated data per chunk': '25.34MiB',
  'estimated docs per chunk': 405
}
---
Shard atlas-b7bb9t-shard-0 at atlas-b7bb9t-shard-0/atlas-b7bb9t-shard-00-00.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-00-01.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-00-02.qzt8l.mmscloudteam.com:27017
{
  data: '83.73MiB',
  docs: 1339,
  chunks: 3,
  'estimated data per chunk': '27.91MiB',
  'estimated docs per chunk': 446
}
---
Totals
{
  data: '159.77MiB',
  docs: 2555,
  chunks: 6,
  'Shard atlas-b7bb9t-shard-1': [
    '47.59 % data',
    '47.59 % docs in cluster',
    '64KiB avg obj size on shard'
  ],
  'Shard atlas-b7bb9t-shard-0': [
    '52.4 % data',
    '52.4 % docs in cluster',
    '64KiB avg obj size on shard'
  ]
}

 

 

Mongo config server fassert error

{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21364,   "ctx":"ReplCoord-8","msg":"Caught up to the latest optime known via heartbeats after becoming primary","attr":{"targetOpTime":{"ts":{"$timestamp":{"t":1634862438,"i":1}},"t":3},"myLastApplied":{"ts":{"$timestamp":{"t":1634862438,"i":1}},"t":3}}}
{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21363,   "ctx":"ReplCoord-8","msg":"Exited primary catch-up mode"}
{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21107,   "ctx":"ReplCoord-8","msg":"Stopping replication producer"}
{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21239,   "ctx":"ReplBatcher","msg":"Oplog buffer has been drained","attr":{"term":4}}
{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21239,   "ctx":"ReplBatcher","msg":"Oplog buffer has been drained","attr":{"term":4}}
{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21275,   "ctx":"ReplCoordExtern-0","msg":"Recreating cursor for oplog fetcher due to error","attr":{"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1634862438,"i":1}},"t":3},"attemptsRemaining":1,"error":"SocketException: Error while getting the next batch in the oplog fetcher :: caused by :: recv failed while exhausting cursor :: caused by :: short read"}}
{"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21094,   "ctx":"BackgroundSync","msg":"Replication producer stopped after oplog fetcher finished returning a batch from our sync source. Abandoning this batch of oplog entries and re-evaluating our sync source"}
 
 
{"t":{"$date":"2021-10-22T00:33:39.876+00:00"},"s":"I",  "c":"CONTROL",  "id":20698,   "ctx":"-","msg":"***** SERVER RESTARTED *****"}
{"t":{"$date":"2021-10-22T00:33:39.889+00:00"},"s":"I",  "c":"NETWORK",  "id":4913010, "ctx":"-","msg":"Certificate information","attr":{"subject":"CN=*.qzt8l.mmscloudteam.com","issuer":"CN=R3,O=Let's Encrypt,C=US","thumbprint":"854A2FC27CFD4C5098A4E6DA13264DCD62F38C83","notValidBefore":{"$date":"2021-10-21T19:07:17.000Z"},"notValidAfter":{"$date":"2022-01-19T19:07:16.000Z"},"keyFile":"/etc/pki/tls/private/mongod.pem","type":"Server"}}
{"t":{"$date":"2021-10-22T00:33:39.890+00:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true}}}
{"t":{"$date":"2021-10-22T00:33:39.891+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-10-22T00:33:39.891+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
{"t":{"$date":"2021-10-22T00:33:39.892+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-10-22T00:33:39.895+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
{"t":{"$date":"2021-10-22T00:33:39.896+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","ns":"config.tenantMigrationDonors"}}
{"t":{"$date":"2021-10-22T00:33:39.896+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","ns":"config.tenantMigrationRecipients"}}
{"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":29201,"port":33751,"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-2","architecture":"64-bit","host":"atlas-4wnh1p-config-00-02.qzt8l.mmscloudteam.com"}}
{"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.3","gitVersion":"657fea5a61a74d7a79df7aff8e4bcf0bc742b748","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":["enterprise"],"allocator":"tcmalloc","environment":{"distmod":"rhel70","distarch":"x86_64","target_arch":"x86_64"}}}}
{"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"CentOS Linux release 7.9.2009 (Core)","version":"Kernel 3.10.0-1160.41.1.el7.x86_64"}}}
{"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/srv/mongodb/atlas-4wnh1p-config-0-node-2/automation-mongod.conf","net":{"bindIp":"0.0.0.0","compression":{"compressors":"snappy,zstd,zlib"},"maxIncomingConnections":1500,"maxIncomingConnectionsOverride":["192.168.248.0/21"],"port":33751,"tls":{"CAFile":"/etc/pki/tls/certs/atlas-bundle.crt","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/etc/pki/tls/private/mongod.pem","clusterCAFile":"/var/lib/mongodb-mms-automation/atlas-cluster-managed.crt","disabledProtocols":"TLS1_0,TLS1_1","mode":"requireTLS"}},"processManagement":{"fork":true},"security":{"javascriptEnabled":true},"setParameter":{"allowRolesFromX509Certificates":"true","authenticationMechanisms":"SCRAM-SHA-1,MONGODB-AWS,MONGODB-X509","awsSTSUrl":"https://sts.us-west-1.amazonaws.com","disableLogicalSessionCacheRefresh":"true","honorSystemUmask":"false","mongotHost":"localhost:28000","recoverFromOplogAsStandalone":"true","reportOpWriteConcernCountersInServerStatus":"true","startupRecoveryForRestore":"true","suppressNoTLSPeerCertificateWarning":"true","takeUnstableCheckpointOnShutdown":"true","tlsCATrusts":"[{sha256: \"76D7D7A042F4AAC3BFCF548DD0BD9912B1370854AEEFC4C69D88532654F11633\", roles: [{ role: \"\", db: \"\"}]}]","tlsWithholdClientCertificate":"true","ttlMonitorEnabled":"false","watchdogPeriodSeconds":"60"},"storage":{"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-2","engine":"wiredTiger","wiredTiger":{"engineConfig":{"configString":"cache_size=512MB"}}},"systemLog":{"destination":"file","logAppend":true,"path":"/srv/mongodb/atlas-4wnh1p-config-0-node-2/mongodb.log"}}}}
{"t":{"$date":"2021-10-22T00:33:39.899+00:00"},"s":"I",  "c":"NETWORK",  "id":577164,  "ctx":"initandlisten","msg":"OCSP fetch/staple started"}
{"t":{"$date":"2021-10-22T00:33:39.901+00:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"CurlConnPool-0","msg":"Connecting","attr":{"hostAndPort":"r3.o.lencr.org:80"}}
{"t":{"$date":"2021-10-22T00:33:39.903+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=387M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),builtin_extension_config=(zstd=(compression_level=6)),file_manager=(close_idle_time=600,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],cache_size=512MB"}}
{"t":{"$date":"2021-10-22T00:33:39.938+00:00"},"s":"I",  "c":"NETWORK",  "id":577165,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP fetch/staple completion"}
{"t":{"$date":"2021-10-22T00:33:39.938+00:00"},"s":"I",  "c":"NETWORK",  "id":577163,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP response","attr":{"status":{"code":0,"codeName":"OK"}}}
{"t":{"$date":"2021-10-22T00:33:39.941+00:00"},"s":"I",  "c":"NETWORK",  "id":577164,  "ctx":"OCSP Fetch and Staple","msg":"OCSP fetch/staple started"}
{"t":{"$date":"2021-10-22T00:33:39.995+00:00"},"s":"I",  "c":"NETWORK",  "id":577165,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP fetch/staple completion"}
{"t":{"$date":"2021-10-22T00:33:39.995+00:00"},"s":"I",  "c":"NETWORK",  "id":577163,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP response","attr":{"status":{"code":0,"codeName":"OK"}}}
{"t":{"$date":"2021-10-22T00:33:40.565+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:565489][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1 through 2"}}
{"t":{"$date":"2021-10-22T00:33:40.697+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:697230][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 2 through 2"}}
{"t":{"$date":"2021-10-22T00:33:40.821+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:821687][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 1/33920 to 2/256"}}
{"t":{"$date":"2021-10-22T00:33:40.961+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:961925][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1 through 2"}}
{"t":{"$date":"2021-10-22T00:33:41.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:45955][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 2 through 2"}}
{"t":{"$date":"2021-10-22T00:33:41.129+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:129209][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (0, 0)"}}
{"t":{"$date":"2021-10-22T00:33:41.129+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:129297][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (0, 0)"}}
{"t":{"$date":"2021-10-22T00:33:41.131+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:131563][29201:0x7fa2d269ebc0], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 1, snapshot max: 1 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 7"}}
{"t":{"$date":"2021-10-22T00:33:41.140+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":1237}}
{"t":{"$date":"2021-10-22T00:33:41.140+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
{"t":{"$date":"2021-10-22T00:33:41.144+00:00"},"s":"I",  "c":"STORAGE",  "id":4366406, "ctx":"initandlisten","msg":"Modifying the table logging settings for all existing WiredTiger tables","attr":{"loggingEnabled":false}}
{"t":{"$date":"2021-10-22T00:33:41.154+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
{"t":{"$date":"2021-10-22T00:33:41.158+00:00"},"s":"W",  "c":"CONTROL",  "id":22120,   "ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted","tags":["startupWarnings"]}
{"t":{"$date":"2021-10-22T00:33:41.158+00:00"},"s":"I",  "c":"CONTROL",  "id":23408,   "ctx":"initandlisten","msg":"Starting Watchdog Monitor"}
{"t":{"$date":"2021-10-22T00:33:41.166+00:00"},"s":"I",  "c":"NETWORK",  "id":4915702, "ctx":"initandlisten","msg":"Updated wire specification","attr":{"oldSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true},"newSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":13,"maxWireVersion":13},"outgoing":{"minWireVersion":13,"maxWireVersion":13},"isInternalClient":true}}}
{"t":{"$date":"2021-10-22T00:33:41.166+00:00"},"s":"I",  "c":"STORAGE",  "id":5071100, "ctx":"initandlisten","msg":"Clearing temp directory"}
{"t":{"$date":"2021-10-22T00:33:41.166+00:00"},"s":"I",  "c":"STORAGE",  "id":21005,   "ctx":"initandlisten","msg":"Not restarting unfinished index builds because we are in standalone mode"}
{"t":{"$date":"2021-10-22T00:33:41.170+00:00"},"s":"I",  "c":"CONTROL",  "id":20536,   "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
{"t":{"$date":"2021-10-22T00:33:41.172+00:00"},"s":"I",  "c":"FTDC",     "id":20625,   "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/srv/mongodb/atlas-4wnh1p-config-0-node-2/diagnostic.data"}}
{"t":{"$date":"2021-10-22T00:33:41.175+00:00"},"s":"W",  "c":"REPL",     "id":5576601, "ctx":"initandlisten","msg":"Replication startup parameter 'startupRecoveryForRestore' is set and recovering from an unstable checkpoint.  Assuming this is a resume of an earlier attempt to recover for restore."}
{"t":{"$date":"2021-10-22T00:33:41.178+00:00"},"s":"I",  "c":"REPL",     "id":21557,   "ctx":"initandlisten","msg":"Removing unapplied oplog entries after oplogTruncateAfterPoint","attr":{"oplogTruncateAfterPoint":{"":{"$timestamp":{"t":1634858228,"i":1}}}}}
{"t":{"$date":"2021-10-22T00:33:41.178+00:00"},"s":"F",  "c":"-",        "id":23095,   "ctx":"initandlisten","msg":"Fatal assertion","attr":{"msgid":34418,"error":"NamespaceNotFound: Can't find local.oplog.rs","file":"src/mongo/db/repl/replication_recovery.cpp","line":764}}
{"t":{"$date":"2021-10-22T00:33:41.178+00:00"},"s":"F",  "c":"-",        "id":23096,   "ctx":"initandlisten","msg":"\n\n***aborting after fassert() failure\n\n"

 

 



 Comments   
Comment by Githook User [ 28/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60916 Attempt to reload ShardRegistry in resharding coordinator.

The ARS (AsyncRequestSender) calls ShardRegistry::getShardNoReload() and
can therefore throw a ShardNotFound exception when the ShardRegistry is
still initializing.

(cherry picked from commit dd41a99e2c177d5bd3a38229d85458117480c5d9)
Branch: v5.0
https://github.com/mongodb/mongo/commit/062b1e0d3cd67c4a93a2a1db9aea1f952b517492

Comment by Githook User [ 28/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60916 Attempt to reload ShardRegistry in resharding coordinator.

The ARS (AsyncRequestSender) calls ShardRegistry::getShardNoReload() and
can therefore throw a ShardNotFound exception when the ShardRegistry is
still initializing.

(cherry picked from commit dd41a99e2c177d5bd3a38229d85458117480c5d9)
Branch: v5.1
https://github.com/mongodb/mongo/commit/0036d54b93c6703864c92181c9694605726f108b

Comment by Githook User [ 27/Oct/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: SERVER-60916 Attempt to reload ShardRegistry in resharding coordinator.

The ARS (AsyncRequestSender) calls ShardRegistry::getShardNoReload() and
can therefore throw a ShardNotFound exception when the ShardRegistry is
still initializing.
Branch: master
https://github.com/mongodb/mongo/commit/dd41a99e2c177d5bd3a38229d85458117480c5d9

Comment by Max Hirschhorn [ 27/Oct/21 ]

pierlauro.sciarelli and I discussed this issue some more over Slack. The plan is to

  1. Have ReshardingCoordinator switch over to calling sharding_ddl_util::sendAuthenticatedCommandToShards() for how it sends the _shardsvrCommitReshardCollection commands, etc. to the shards to get the benefit of the changes from SERVER-57280.
  2. Have the call to ShardRegistry::reload() in sharding_ddl_util::sendAuthenticatedCommandToShards() continue waiting even if the reload attempt fails with ReadConcernMajorityNotAvailableYet. This will be handled separately by SERVER-61003.

We expect the combination of these two changes to address the issue reported by the Automation team in SERVER-60916. chuck.zhang, would it be possible for us to have you test the restore procedure using a custom build (non-official release) to confirm this?

(We can separately evaluate undoing these workarounds after SERVER-57519 is implemented.)

Comment by Chuck Zhang [ 25/Oct/21 ]

max.hirschhorn

One theory is that the data from config.shards the config server primary read into its ShardRegistry is from before the writes to the config.shards documents when the mongod was restarted as a stand-alone mongod. I think getting the data files from the restore procedure before the fatal assertion would greatly help with the analysis needed in this ticket.

 

Per discussed offline, I am working on data files.

 

Comment by Pierlauro Sciarelli [ 25/Oct/21 ]

The problem could be that the ARS did not wait for the reload because it actually relies on getShardNoReload instead of getShard (SERVER-57519).

As SERVER-57519 doesn't seem straightforward, we may evaluate making the ARS check at least that the shard registry is up: it would not solve all ARS's causality problems, but at least protect against a clearly empty shard registry as it seems the case for this ticket.

Comment by Max Hirschhorn [ 25/Oct/21 ]

Kal and I discussed this issue over Zoom. He made a point that ShardRegistry::getShard() would have waited for the topology time in the ShardRegistry to have advanced. So even though ShardRegistry::_periodicReload() hit the ReadConcernMajorityNotAvailableYet error, the ShardRegistry::getShard() from the ARS must not have. Otherwise we would have seen the ReadConcernMajorityNotAvailableYet error instead of the ShardNotFound error.

One theory is that the data from config.shards the config server primary read into its ShardRegistry is from before the writes to the config.shards documents when the mongod was restarted as a stand-alone mongod. I think getting the data files from the restore procedure before the fatal assertion would greatly help with the analysis needed in this ticket. Edit: Pierlauro's point below that the ARS isn't awaiting for the ShardRegistry to reload makes this theory far less likely. Calling ShardRegistry::reload() like sendAuthenticatedCommandToShards() does for the Sharding DDL Coordinators wouldn't be sufficient because that function can throw an ReadConcernMajorityNotAvailableYet exception. Pierlauro said he would discuss this case with the Sharding EMEA team tomorrow.

Comment by Pierlauro Sciarelli [ 25/Oct/21 ]

The Sharding DDL coordinators do not remember the list of shards if I am not mistaken, because we will either always broadcast (for drop) or we will reconstruct the list of shards from the routing information, for rename.

Confirming that the DDL coordinators are not saving the list of shards at any time. All DDL operations (maybe not resharding?) are performing broadcasts to all shards present in the shard registry, also rename (just to clarify - that's not relevant at moment).

So in that sense, I don't think we can encounter ShardNotFound error from the ShardRegistry, unless during recovery.

We are only handling (retrying) all kind of failures that may happen within nodes belonging to shards, but - as pointed out in the scope's non-goals - we have not taken into consideration the scenario in which a shard may be added/removed during DDLs.

I am not sure what we use for the DDL coordinators, I believe that we persist the vector clock, which in turn will store the topologyOpTime, which ensures that ShardRegistry::getShard will refresh at least up to that topologyOpTime.

We don't currently checkpoint the vector clock, I believe that we didn't take this solution into account for the same reason mentioned above. However, in SERVER-57280 we indirectly synchronized DDL coordinators with the shard registry initialization: requests to shard done by coordinators (maybe not the resharding one?) are all passing through sharding_ddl_util::sendAuthenticatedCommandToShards that will trigger the first shard registry reload if needed.

Comment by Kaloian Manassiev [ 25/Oct/21 ]

The Sharding DDL coordinators do not remember the list of shards if I am not mistaken, because we will either always broadcast (for drop) or we will reconstruct the list of shards from the routing information, for rename. So in that sense, I don't think we can encounter ShardNotFound error from the ShardRegistry, unless during recovery.

I am not sure what we use for the DDL coordinators, I believe that we persist the vector clock, which in turn will store the topologyOpTime, which ensures that ShardRegistry::getShard will refresh at least up to that topologyOpTime.

pierlauro.sciarelli/jordi.serra-torrens: Can you guys confirm that my understanding above is correct?

Comment by Max Hirschhorn [ 25/Oct/21 ]

Assigned this to Sharding EMEA to reconcile the PrimaryOnlyService and ShardRegistry initialization order. Feel free to pass back to Sharding NYC if you don't feel like there's a general solution to take here.

Comment by Chuck Zhang [ 22/Oct/21 ]

max.hirschhorn

I see the cause of the original fassert in the new logs that were uploaded, thanks. The resharding coordinator on the config server primary is unexpectedly getting ShardNotFound when attempting to contact the atlas-4wnh1p-shard-0 shard. Chuck Zhang, could you please confirm/answer the following for me:

1. Is atlas-4wnh1p-shard-0 the destShardName of the restored cluster?
2. What are the contents of the config.shards collection on the config server? Is there an entry for the atlas-4wnh1p-shard-0 shard?

1. yes. atlas-4wnh1p-shard-0 is one of the destShardName of the restored cluster
2. I can't connect to the config server after the restore since it's not up.

Connecting to:		mongodb+srv://cluster0.qzt8l.mmscloudteam.com/config
MongoNetworkError: connect ECONNREFUSED 13.56.37.245:27016

Comment by Max Hirschhorn [ 22/Oct/21 ]

Is it by any chance a problem of replication's step-up actions executing before that method has completed?

Yes, I think the issue could be framed that way. Would we also face a similar issue in the Sharding DDL Coordinators? Or how do those primary-only service Instances handle ShardNotFound?

Comment by Kaloian Manassiev [ 22/Oct/21 ]

It looks like we have the waitForShardRegistryReload() with which we block opening the connections before the SahrdRegistry has performed a refresh. Is it by any chance a problem of replication's step-up actions executing before that method has completed?

Comment by Max Hirschhorn [ 22/Oct/21 ]

It looks like the ShardRegistry may be in an unqueryable state because using read concern level "majority" internally. Assuming that is expected, kaloian.manassiev, tommaso.tocci, how should other sharding components be waiting for the first reload of the ShardRegistry to have succeeded?

atlas-4wnh1p-config-00-02.qzt8l.mmscloudteam.com_2021-10-22T00_00_00_2021-10-22T03_30_00_mongodb.log

{"t":{"$date":"2021-10-22T00:32:44.246+00:00"},"s":"I",  "c":"SHARDING", "id":22727,   "ctx":"shard-registry-reload","msg":"Error running periodic reload of shard registry","attr":{"error":"ReadConcernMajorityNotAvailableYet: could not get updated shard list from config server :: caused by :: Read concern majority reads are currently not possible.","shardRegistryReloadIntervalSeconds":1}}

Comment by Max Hirschhorn [ 22/Oct/21 ]

I see the cause of the original fassert in the new logs that were uploaded, thanks. The resharding coordinator on the config server primary is unexpectedly getting ShardNotFound when attempting to contact the atlas-4wnh1p-shard-0 shard. chuck.zhang, could you please confirm/answer the following for me:

  1. Is atlas-4wnh1p-shard-0 the destShardName of the restored cluster?
  2. What are the contents of the config.shards collection on the config server? Is there an entry for the atlas-4wnh1p-shard-0 shard?

atlas-4wnh1p-config-00-02.qzt8l.mmscloudteam.com_2021-10-22T00_00_00_2021-10-22T03_30_00_mongodb.log

{"t":{"$date":"2021-10-22T00:32:44.683+00:00"},"s":"F",  "c":"RESHARD",  "id":5277000, "ctx":"ReshardingCoordinatorService-2","msg":"Unrecoverable error past the point resharding was guaranteed to succeed","attr":{"error":"ShardNotFound: Failed command { _shardsvrCommitReshardCollection: \"data.random\", reshardingUUID: UUID(\"04044ed9-81d2-4ea9-83cd-66a746fa780e\"), writeConcern: { w: \"majority\" } } for database 'admin' on shard 'atlas-4wnh1p-shard-0' :: caused by :: Could not find shard atlas-4wnh1p-shard-0"}}
{"t":{"$date":"2021-10-22T00:32:44.683+00:00"},"s":"F",  "c":"-",        "id":23089,   "ctx":"ReshardingCoordinatorService-2","msg":"Fatal assertion","attr":{"msgid":5277000,"file":"src/mongo/db/s/resharding/resharding_coordinator_service.cpp","line":1286}}
{"t":{"$date":"2021-10-22T00:32:44.683+00:00"},"s":"F",  "c":"-",        "id":23090,   "ctx":"ReshardingCoordinatorService-2","msg":"\n\n***aborting after fassert() failure\n\n"}
{"t":{"$date":"2021-10-22T00:32:44.683+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"ReshardingCoordinatorService-2","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}

Comment by Chuck Zhang [ 22/Oct/21 ]

max.hirschhorn,

Sorry for being not clear on the ask. I wasn't aware the mongod logs were missing. I've reattached the mongod logs which covers the time range of the restore.

File atlas-4wnh1p-config-00-02.qzt8l.mmscloudteam.com_2021-10-22T00_00_00_2021-10-22T03_30_00_mongodb seems quite related to the root cause.

Comment by Max Hirschhorn [ 22/Oct/21 ]

chuck.zhang, I didn't realize you were asking specifically about the restart attempt at 2021-10-22T00:33:24.672. Is there a reason mongodb.log doesn't cover that time range when logAppend: true is set in the automation configuration?

It looks like there was an fassert() at that time too. Unfortunately the logs from mongod in the automation log are truncated so I couldn't say whether the original fassert was location code 34418 or not. I suspect it probably was though.

Would it be possible to rerun the test/procedure with the maxLogSizeKB server parameter set to a larger value to avoid this log truncation? --setParameter maxLogSizeKB=20

automation-agent-verbose.log

{"t":{"$date":"2021-10-22T00:33:24.614+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862804:614497][30560:0x7f06b853abc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (0, 0)"}}
{"t":{"$date":"2021-10-22T00:33:24.614+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862804:614561][30560:0x7f06b853abc0], txn-recover: [WT_VERB_RECOVERY | WT_V...6,   "ctx":"initandlisten","msg":"\n\n***aborting after fassert() failure\n\n"}
 : exit status 14 (warning: log line attempted (13.9k) over max size (10.0k), printing beginning and end)

Comment by Chuck Zhang [ 22/Oct/21 ]

max.hirschhorn Thanks for looking.

Pulling out the error from the attached log file for ease of viewing and searchability. Is there some part of the restore procedure which would explain why the oplog collection doesn't exist on the config server node? At first glance I'm not sure how this is connected to taking a backup while there's a resharding operation.

 

mongodb.log
Unknown macro: {"t"}

,"s":"W", "c":"REPL", "id":5576601, "ctx":"initandlisten","msg":"Replication startup parameter 'startupRecoveryForRestore' is set and recovering from an unstable checkpoint. Assuming this is a resume of an earlier attempt to recover for restore."}

Unknown macro: {"t"}

,"s":"I", "c":"REPL", "id":21557, "ctx":"initandlisten","msg":"Removing unapplied oplog entries after oplogTruncateAfterPoint","attr":{"oplogTruncateAfterPoint":{"":

Unknown macro: {"$timestamp"}

}}}

Unknown macro: {"t"}

,"s":"F", "c":"-", "id":23095, "ctx":"initandlisten","msg":"Fatal assertion","attr":{"msgid":34418,"error":"NamespaceNotFound: Can't find local.oplog.rs","file":"src/mongo/db/repl/replication_recovery.cpp","line":764}}

Unknown macro: {"t"}

,"s":"F", "c":"-", "id":23096, "ctx":"initandlisten","msg":"\n\n***aborting after fassert() failure\n\n"}

The logs above is actually 1 and half hours after the backup restore initiated. It seems the mongod took 2 hours to have some initial logs printed and it only lasts for 1 min.
Do you see anything suspicious in the below log that causing mongod fail to restart?

[2021-10-22T00:33:24.672+0000] [.error] [src/action/start.go:func1:93] [103] <atlas-4wnh1p-config-0-node-1> [00:33:24.672] Error starting mongod : <atlas-4wnh1p-config-0-node-1> [00:33:24.672] Error running start command. cmd=[Args=[/var/lib/mongodb-mms-automation/mongodb-linux-x86_64-5.0.3-ent/bin/mongod -f /srv/mongodb/atlas-4wnh1p-config-0-node-1/automation-mongod.conf]], stip=[args={"net":{"bindIp":"0.0.0.0","compression":{"compressors":"snappy,zstd,zlib"},"maxIncomingConnections":1500,"maxIncomingConnectionsOverride":["192.168.248.0/21"],"port":41220,"tls":{"CAFile":"/etc/pki/tls/certs/atlas-bundle.crt","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/etc/pki/tls/private/mongod.pem","clusterCAFile":"/var/lib/mongodb-mms-automation/atlas-cluster-managed.crt","disabledProtocols":"TLS1_0,TLS1_1","mode":"requireTLS"}},"security":{"javascriptEnabled":true},"setParameter":{"allowRolesFromX509Certificates":"true","authenticationMechanisms":"SCRAM-SHA-1,MONGODB-AWS,MONGODB-X509","awsSTSUrl":"https://sts.us-west-1.amazonaws.com","disableLogicalSessionCacheRefresh":"true","honorSystemUmask":"false","mongotHost":"localhost:28000","recoverFromOplogAsStandalone":"true","reportOpWriteConcernCountersInServerStatus":"true","startupRecoveryForRestore":"true","suppressNoTLSPeerCertificateWarning":"true","takeUnstableCheckpointOnShutdown":"true","tlsCATrusts":"[{sha256: \"76D7D7A042F4AAC3BFCF548DD0BD9912B1370854AEEFC4C69D88532654F11633\", roles: [{ role: \"\", db: \"\"}]}]","tlsWithholdClientCertificate":"true","ttlMonitorEnabled":"false","watchdogPeriodSeconds":"60"},"storage":{"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-1","wiredTiger":{"engineConfig":{"configString":"cache_size=512MB"}}},"systemLog":{"destination":"file","logAppend":true,"path":"/srv/mongodb/atlas-4wnh1p-config-0-node-1/mongodb.log"}}[],confPath=/srv/mongodb/atlas-4wnh1p-config-0-node-1/automation-mongod.conf,version=5.0.3-ent-657fea5a61a74d7a79df7aff8e4bcf0bc742b748(enterprise),isKmipRotateMasterKey=false,useOldConfFile=false][2021-10-22T00:33:24.672+0000] [.error] [src/action/start.go:func1:93] [103] <atlas-4wnh1p-config-0-node-1> [00:33:24.672] Error starting mongod : <atlas-4wnh1p-config-0-node-1> [00:33:24.672] Error running start command. cmd=[Args=[/var/lib/mongodb-mms-automation/mongodb-linux-x86_64-5.0.3-ent/bin/mongod -f /srv/mongodb/atlas-4wnh1p-config-0-node-1/automation-mongod.conf]], stip=[args={"net":{"bindIp":"0.0.0.0","compression":{"compressors":"snappy,zstd,zlib"},"maxIncomingConnections":1500,"maxIncomingConnectionsOverride":["192.168.248.0/21"],"port":41220,"tls":{"CAFile":"/etc/pki/tls/certs/atlas-bundle.crt","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/etc/pki/tls/private/mongod.pem","clusterCAFile":"/var/lib/mongodb-mms-automation/atlas-cluster-managed.crt","disabledProtocols":"TLS1_0,TLS1_1","mode":"requireTLS"}},"security":{"javascriptEnabled":true},"setParameter":{"allowRolesFromX509Certificates":"true","authenticationMechanisms":"SCRAM-SHA-1,MONGODB-AWS,MONGODB-X509","awsSTSUrl":"https://sts.us-west-1.amazonaws.com","disableLogicalSessionCacheRefresh":"true","honorSystemUmask":"false","mongotHost":"localhost:28000","recoverFromOplogAsStandalone":"true","reportOpWriteConcernCountersInServerStatus":"true","startupRecoveryForRestore":"true","suppressNoTLSPeerCertificateWarning":"true","takeUnstableCheckpointOnShutdown":"true","tlsCATrusts":"[{sha256: \"76D7D7A042F4AAC3BFCF548DD0BD9912B1370854AEEFC4C69D88532654F11633\", roles: [{ role: \"\", db: \"\"}]}]","tlsWithholdClientCertificate":"true","ttlMonitorEnabled":"false","watchdogPeriodSeconds":"60"},"storage":{"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-1","wiredTiger":{"engineConfig":{"configString":"cache_size=512MB"}}},"systemLog":{"destination":"file","logAppend":true,"path":"/srv/mongodb/atlas-4wnh1p-config-0-node-1/mongodb.log"}}[],confPath=/srv/mongodb/atlas-4wnh1p-config-0-node-1/automation-mongod.conf,version=5.0.3-ent-657fea5a61a74d7a79df7aff8e4bcf0bc742b748(enterprise),isKmipRotateMasterKey=false,useOldConfFile=false]
 
 - Mongo Logs: .qzt8l.mmscloudteam.com"}}{"t":{"$date":"2021-10-22T00:33:23.470+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.3","gitVersion":"657fea5a61a74d7a79df7aff8e4bcf0bc742b748","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":["enterprise"],"allocator":"tcmalloc","environment":{"distmod":"rhel70","distarch":"x86_64","target_arch":"x86_64"}}}}{"t":{"$date":"2021-10-22T00:33:23.470+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"CentOS Linux release 7.9.2009 (Core)","version":"Kernel 3.10.0-1160.41.1.el7.x86_64"}}}{"t":{"$date":"2021-10-22T00:33:23.470+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/srv/mongodb/atlas-4wnh1p-config-0-node-1/automation-mongod.conf","net":{"bindIp":"0.0.0.0","compression":{"compressors":"snappy,zstd,zlib"},"maxIncomingConnections":1500,"maxIncomingConnectionsOverride":["192.168.248.0/21"],"port":41220,"tls":{"CAFile":"/etc/pki/tls/certs/atlas-bundle.crt","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/etc/pki/tls/private/mongod.pem","clusterCAFile":"/var/lib/mongodb-mms-automation/atlas-cluster-managed.crt","disabledProtocols":"TLS1_0,TLS1_1","mode":"requireTLS"}},"processManagement":{"fork":true},"security":{"javascriptEnabled":true},"setParameter":{"allowRolesFromX509Certificates":"true","authenticationMechanisms":"SCRAM-SHA-1,MONGODB-AWS,MONGODB-X509","awsSTSUrl":"https://sts.us-west-1.amazonaws.com","disableLogicalSessionCacheRefresh":"true","honorSystemUmask":"false","mongotHost":"localhost:28000","recoverFromOplogAsStandalone":"true","reportOpWriteConcernCountersInServerStatus":"true","startupRecoveryForRestore":"true","suppressNoTLSPeerCertificateWarning":"true","takeUnstableCheckpointOnShutdown":"true","tlsCATrusts":"[{sha256: \"76D7D7A042F4AAC3BFCF548DD0BD9912B1370854AEEFC4C69D88532654F11633\", roles: [{ role: \"\", db: \"\"}]}]","tlsWithholdClientCertificate":"true","ttlMonitorEnabled":"false","watchdogPeriodSeconds":"60"},"storage":{"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-1","engine":"wiredTiger","wiredTiger":{"engineConfig":{"configString":"cache_size=512MB"}}},"systemLog":{"destination":"file","logAppend":true,"path":"/srv/mongodb/atlas-4wnh1p-config-0-node-1/mongodb.log"}}}}

 It looks like we've seen this issue on non-5.0 MongoDB versions as well.

Notes from server team: "Seems like it ignored the catalog file you restored and made a new one. Are you sure all the files were restored properly before starting mongod?"

https://jira.mongodb.org/browse/CLOUDP-78834?focusedCommentId=3909062&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-3909062

As commented in the above link, we haven't seen any issues causing by this so far since it's an existing behavior before reshard collection was introduced.

 

Comment by Max Hirschhorn [ 22/Oct/21 ]

Pulling out the error from the attached log file for ease of viewing and searchability. Is there some part of the restore procedure which would explain why the oplog collection doesn't exist on the config server node? At first glance I'm not sure how this is connected to taking a backup while there's a resharding operation.

mongodb.log

{"t":{"$date":"2021-10-22T02:47:12.044+00:00"},"s":"W",  "c":"REPL",     "id":5576601, "ctx":"initandlisten","msg":"Replication startup parameter 'startupRecoveryForRestore' is set and recovering from an unstable checkpoint.  Assuming this is a resume of an earlier attempt to recover for restore."}
{"t":{"$date":"2021-10-22T02:47:12.047+00:00"},"s":"I",  "c":"REPL",     "id":21557,   "ctx":"initandlisten","msg":"Removing unapplied oplog entries after oplogTruncateAfterPoint","attr":{"oplogTruncateAfterPoint":{"":{"$timestamp":{"t":1634858228,"i":1}}}}}
{"t":{"$date":"2021-10-22T02:47:12.047+00:00"},"s":"F",  "c":"-",        "id":23095,   "ctx":"initandlisten","msg":"Fatal assertion","attr":{"msgid":34418,"error":"NamespaceNotFound: Can't find local.oplog.rs","file":"src/mongo/db/repl/replication_recovery.cpp","line":764}}
{"t":{"$date":"2021-10-22T02:47:12.047+00:00"},"s":"F",  "c":"-",        "id":23096,   "ctx":"initandlisten","msg":"\n\n***aborting after fassert() failure\n\n"}

Generated at Thu Feb 08 05:51:04 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.