Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60916

CPS Restores failed with a snapshot with documents in reshardingOperation

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.2.0, 5.0.4, 5.1.0-rc3
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.1, v5.0
    • Sharding 2021-11-01
    • 1

      Manual tested below:

      1. Created a 5.0 sharded cluster 
      2. Create an index (_id: "hashed")
      3. Run sh.reshardCollection("data.random", {"_id": "hashed"} ) with the above index, and take a snapshot after 1 min.
      4. Restore the above snapshot to a target 5.0 cluster @ 10/22/21 - 12:24 AM 
      5. Restore failed 

       

      reshardCollection

       

      Enterprise [mongos] data> sh.reshardCollection("data.random", {"_id": "hashed"} )
      {
        ok: 1,
        '$clusterTime': {
          clusterTime: Timestamp({ t: 1634858341, i: 56 }),
          signature: {
            hash: Binary(Buffer.from("dab217339e8ecc2cd931ac36643be7522b9aa563", "hex"), 0),
            keyId: Long("7021608605052829705")
          }
        },
        operationTime: Timestamp({ t: 1634858341, i: 52 })
      }
      Enterprise [mongos] data> db.random.getShardDistribution()
      Shard atlas-b7bb9t-shard-1 at atlas-b7bb9t-shard-1/atlas-b7bb9t-shard-01-00.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-01-01.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-01-02.qzt8l.mmscloudteam.com:27017
      {
        data: '76.04MiB',
        docs: 1216,
        chunks: 3,
        'estimated data per chunk': '25.34MiB',
        'estimated docs per chunk': 405
      }
      ---
      Shard atlas-b7bb9t-shard-0 at atlas-b7bb9t-shard-0/atlas-b7bb9t-shard-00-00.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-00-01.qzt8l.mmscloudteam.com:27017,atlas-b7bb9t-shard-00-02.qzt8l.mmscloudteam.com:27017
      {
        data: '83.73MiB',
        docs: 1339,
        chunks: 3,
        'estimated data per chunk': '27.91MiB',
        'estimated docs per chunk': 446
      }
      ---
      Totals
      {
        data: '159.77MiB',
        docs: 2555,
        chunks: 6,
        'Shard atlas-b7bb9t-shard-1': [
          '47.59 % data',
          '47.59 % docs in cluster',
          '64KiB avg obj size on shard'
        ],
        'Shard atlas-b7bb9t-shard-0': [
          '52.4 % data',
          '52.4 % docs in cluster',
          '64KiB avg obj size on shard'
        ]
      }

       

       

      Mongo config server fassert error

      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21364,   "ctx":"ReplCoord-8","msg":"Caught up to the latest optime known via heartbeats after becoming primary","attr":{"targetOpTime":{"ts":{"$timestamp":{"t":1634862438,"i":1}},"t":3},"myLastApplied":{"ts":{"$timestamp":{"t":1634862438,"i":1}},"t":3}}}
      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21363,   "ctx":"ReplCoord-8","msg":"Exited primary catch-up mode"}
      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21107,   "ctx":"ReplCoord-8","msg":"Stopping replication producer"}
      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21239,   "ctx":"ReplBatcher","msg":"Oplog buffer has been drained","attr":{"term":4}}
      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21239,   "ctx":"ReplBatcher","msg":"Oplog buffer has been drained","attr":{"term":4}}
      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21275,   "ctx":"ReplCoordExtern-0","msg":"Recreating cursor for oplog fetcher due to error","attr":{"lastOpTimeFetched":{"ts":{"$timestamp":{"t":1634862438,"i":1}},"t":3},"attemptsRemaining":1,"error":"SocketException: Error while getting the next batch in the oplog fetcher :: caused by :: recv failed while exhausting cursor :: caused by :: short read"}}
      {"t":{"$date":"2021-10-22T00:27:19.458+00:00"},"s":"I",  "c":"REPL",     "id":21094,   "ctx":"BackgroundSync","msg":"Replication producer stopped after oplog fetcher finished returning a batch from our sync source. Abandoning this batch of oplog entries and re-evaluating our sync source"}
      
      
      {"t":{"$date":"2021-10-22T00:33:39.876+00:00"},"s":"I",  "c":"CONTROL",  "id":20698,   "ctx":"-","msg":"***** SERVER RESTARTED *****"}
      {"t":{"$date":"2021-10-22T00:33:39.889+00:00"},"s":"I",  "c":"NETWORK",  "id":4913010, "ctx":"-","msg":"Certificate information","attr":{"subject":"CN=*.qzt8l.mmscloudteam.com","issuer":"CN=R3,O=Let's Encrypt,C=US","thumbprint":"854A2FC27CFD4C5098A4E6DA13264DCD62F38C83","notValidBefore":{"$date":"2021-10-21T19:07:17.000Z"},"notValidAfter":{"$date":"2022-01-19T19:07:16.000Z"},"keyFile":"/etc/pki/tls/private/mongod.pem","type":"Server"}}
      {"t":{"$date":"2021-10-22T00:33:39.890+00:00"},"s":"I",  "c":"NETWORK",  "id":4915701, "ctx":"-","msg":"Initialized wire specification","attr":{"spec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true}}}
      {"t":{"$date":"2021-10-22T00:33:39.891+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
      {"t":{"$date":"2021-10-22T00:33:39.891+00:00"},"s":"I",  "c":"NETWORK",  "id":4648601, "ctx":"main","msg":"Implicit TCP FastOpen unavailable. If TCP FastOpen is required, set tcpFastOpenServer, tcpFastOpenClient, and tcpFastOpenQueueSize."}
      {"t":{"$date":"2021-10-22T00:33:39.892+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
      {"t":{"$date":"2021-10-22T00:33:39.895+00:00"},"s":"W",  "c":"ASIO",     "id":22601,   "ctx":"main","msg":"No TransportLayer configured during NetworkInterface startup"}
      {"t":{"$date":"2021-10-22T00:33:39.896+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationDonorService","ns":"config.tenantMigrationDonors"}}
      {"t":{"$date":"2021-10-22T00:33:39.896+00:00"},"s":"I",  "c":"REPL",     "id":5123008, "ctx":"main","msg":"Successfully registered PrimaryOnlyService","attr":{"service":"TenantMigrationRecipientService","ns":"config.tenantMigrationRecipients"}}
      {"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":4615611, "ctx":"initandlisten","msg":"MongoDB starting","attr":{"pid":29201,"port":33751,"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-2","architecture":"64-bit","host":"atlas-4wnh1p-config-00-02.qzt8l.mmscloudteam.com"}}
      {"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":23403,   "ctx":"initandlisten","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.3","gitVersion":"657fea5a61a74d7a79df7aff8e4bcf0bc742b748","openSSLVersion":"OpenSSL 1.0.1e-fips 11 Feb 2013","modules":["enterprise"],"allocator":"tcmalloc","environment":{"distmod":"rhel70","distarch":"x86_64","target_arch":"x86_64"}}}}
      {"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":51765,   "ctx":"initandlisten","msg":"Operating System","attr":{"os":{"name":"CentOS Linux release 7.9.2009 (Core)","version":"Kernel 3.10.0-1160.41.1.el7.x86_64"}}}
      {"t":{"$date":"2021-10-22T00:33:39.897+00:00"},"s":"I",  "c":"CONTROL",  "id":21951,   "ctx":"initandlisten","msg":"Options set by command line","attr":{"options":{"config":"/srv/mongodb/atlas-4wnh1p-config-0-node-2/automation-mongod.conf","net":{"bindIp":"0.0.0.0","compression":{"compressors":"snappy,zstd,zlib"},"maxIncomingConnections":1500,"maxIncomingConnectionsOverride":["192.168.248.0/21"],"port":33751,"tls":{"CAFile":"/etc/pki/tls/certs/atlas-bundle.crt","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/etc/pki/tls/private/mongod.pem","clusterCAFile":"/var/lib/mongodb-mms-automation/atlas-cluster-managed.crt","disabledProtocols":"TLS1_0,TLS1_1","mode":"requireTLS"}},"processManagement":{"fork":true},"security":{"javascriptEnabled":true},"setParameter":{"allowRolesFromX509Certificates":"true","authenticationMechanisms":"SCRAM-SHA-1,MONGODB-AWS,MONGODB-X509","awsSTSUrl":"https://sts.us-west-1.amazonaws.com","disableLogicalSessionCacheRefresh":"true","honorSystemUmask":"false","mongotHost":"localhost:28000","recoverFromOplogAsStandalone":"true","reportOpWriteConcernCountersInServerStatus":"true","startupRecoveryForRestore":"true","suppressNoTLSPeerCertificateWarning":"true","takeUnstableCheckpointOnShutdown":"true","tlsCATrusts":"[{sha256: \"76D7D7A042F4AAC3BFCF548DD0BD9912B1370854AEEFC4C69D88532654F11633\", roles: [{ role: \"\", db: \"\"}]}]","tlsWithholdClientCertificate":"true","ttlMonitorEnabled":"false","watchdogPeriodSeconds":"60"},"storage":{"dbPath":"/srv/mongodb/atlas-4wnh1p-config-0-node-2","engine":"wiredTiger","wiredTiger":{"engineConfig":{"configString":"cache_size=512MB"}}},"systemLog":{"destination":"file","logAppend":true,"path":"/srv/mongodb/atlas-4wnh1p-config-0-node-2/mongodb.log"}}}}
      {"t":{"$date":"2021-10-22T00:33:39.899+00:00"},"s":"I",  "c":"NETWORK",  "id":577164,  "ctx":"initandlisten","msg":"OCSP fetch/staple started"}
      {"t":{"$date":"2021-10-22T00:33:39.901+00:00"},"s":"I",  "c":"CONNPOOL", "id":22576,   "ctx":"CurlConnPool-0","msg":"Connecting","attr":{"hostAndPort":"r3.o.lencr.org:80"}}
      {"t":{"$date":"2021-10-22T00:33:39.903+00:00"},"s":"I",  "c":"STORAGE",  "id":22315,   "ctx":"initandlisten","msg":"Opening WiredTiger","attr":{"config":"create,cache_size=387M,session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),builtin_extension_config=(zstd=(compression_level=6)),file_manager=(close_idle_time=600,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress,compact_progress],cache_size=512MB"}}
      {"t":{"$date":"2021-10-22T00:33:39.938+00:00"},"s":"I",  "c":"NETWORK",  "id":577165,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP fetch/staple completion"}
      {"t":{"$date":"2021-10-22T00:33:39.938+00:00"},"s":"I",  "c":"NETWORK",  "id":577163,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP response","attr":{"status":{"code":0,"codeName":"OK"}}}
      {"t":{"$date":"2021-10-22T00:33:39.941+00:00"},"s":"I",  "c":"NETWORK",  "id":577164,  "ctx":"OCSP Fetch and Staple","msg":"OCSP fetch/staple started"}
      {"t":{"$date":"2021-10-22T00:33:39.995+00:00"},"s":"I",  "c":"NETWORK",  "id":577165,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP fetch/staple completion"}
      {"t":{"$date":"2021-10-22T00:33:39.995+00:00"},"s":"I",  "c":"NETWORK",  "id":577163,  "ctx":"OCSPManagerHTTP-0","msg":"OCSP response","attr":{"status":{"code":0,"codeName":"OK"}}}
      {"t":{"$date":"2021-10-22T00:33:40.565+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:565489][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1 through 2"}}
      {"t":{"$date":"2021-10-22T00:33:40.697+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:697230][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 2 through 2"}}
      {"t":{"$date":"2021-10-22T00:33:40.821+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:821687][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Main recovery loop: starting at 1/33920 to 2/256"}}
      {"t":{"$date":"2021-10-22T00:33:40.961+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862820:961925][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 1 through 2"}}
      {"t":{"$date":"2021-10-22T00:33:41.045+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:45955][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY_PROGRESS] Recovering log 2 through 2"}}
      {"t":{"$date":"2021-10-22T00:33:41.129+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:129209][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global recovery timestamp: (0, 0)"}}
      {"t":{"$date":"2021-10-22T00:33:41.129+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:129297][29201:0x7fa2d269ebc0], txn-recover: [WT_VERB_RECOVERY | WT_VERB_RECOVERY_PROGRESS] Set global oldest timestamp: (0, 0)"}}
      {"t":{"$date":"2021-10-22T00:33:41.131+00:00"},"s":"I",  "c":"STORAGE",  "id":22430,   "ctx":"initandlisten","msg":"WiredTiger message","attr":{"message":"[1634862821:131563][29201:0x7fa2d269ebc0], WT_SESSION.checkpoint: [WT_VERB_CHECKPOINT_PROGRESS] saving checkpoint snapshot min: 1, snapshot max: 1 snapshot count: 0, oldest timestamp: (0, 0) , meta checkpoint timestamp: (0, 0) base write gen: 7"}}
      {"t":{"$date":"2021-10-22T00:33:41.140+00:00"},"s":"I",  "c":"STORAGE",  "id":4795906, "ctx":"initandlisten","msg":"WiredTiger opened","attr":{"durationMillis":1237}}
      {"t":{"$date":"2021-10-22T00:33:41.140+00:00"},"s":"I",  "c":"RECOVERY", "id":23987,   "ctx":"initandlisten","msg":"WiredTiger recoveryTimestamp","attr":{"recoveryTimestamp":{"$timestamp":{"t":0,"i":0}}}}
      {"t":{"$date":"2021-10-22T00:33:41.144+00:00"},"s":"I",  "c":"STORAGE",  "id":4366406, "ctx":"initandlisten","msg":"Modifying the table logging settings for all existing WiredTiger tables","attr":{"loggingEnabled":false}}
      {"t":{"$date":"2021-10-22T00:33:41.154+00:00"},"s":"I",  "c":"STORAGE",  "id":22262,   "ctx":"initandlisten","msg":"Timestamp monitor starting"}
      {"t":{"$date":"2021-10-22T00:33:41.158+00:00"},"s":"W",  "c":"CONTROL",  "id":22120,   "ctx":"initandlisten","msg":"Access control is not enabled for the database. Read and write access to data and configuration is unrestricted","tags":["startupWarnings"]}
      {"t":{"$date":"2021-10-22T00:33:41.158+00:00"},"s":"I",  "c":"CONTROL",  "id":23408,   "ctx":"initandlisten","msg":"Starting Watchdog Monitor"}
      {"t":{"$date":"2021-10-22T00:33:41.166+00:00"},"s":"I",  "c":"NETWORK",  "id":4915702, "ctx":"initandlisten","msg":"Updated wire specification","attr":{"oldSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":0,"maxWireVersion":13},"outgoing":{"minWireVersion":0,"maxWireVersion":13},"isInternalClient":true},"newSpec":{"incomingExternalClient":{"minWireVersion":0,"maxWireVersion":13},"incomingInternalClient":{"minWireVersion":13,"maxWireVersion":13},"outgoing":{"minWireVersion":13,"maxWireVersion":13},"isInternalClient":true}}}
      {"t":{"$date":"2021-10-22T00:33:41.166+00:00"},"s":"I",  "c":"STORAGE",  "id":5071100, "ctx":"initandlisten","msg":"Clearing temp directory"}
      {"t":{"$date":"2021-10-22T00:33:41.166+00:00"},"s":"I",  "c":"STORAGE",  "id":21005,   "ctx":"initandlisten","msg":"Not restarting unfinished index builds because we are in standalone mode"}
      {"t":{"$date":"2021-10-22T00:33:41.170+00:00"},"s":"I",  "c":"CONTROL",  "id":20536,   "ctx":"initandlisten","msg":"Flow Control is enabled on this deployment"}
      {"t":{"$date":"2021-10-22T00:33:41.172+00:00"},"s":"I",  "c":"FTDC",     "id":20625,   "ctx":"initandlisten","msg":"Initializing full-time diagnostic data capture","attr":{"dataDirectory":"/srv/mongodb/atlas-4wnh1p-config-0-node-2/diagnostic.data"}}
      {"t":{"$date":"2021-10-22T00:33:41.175+00:00"},"s":"W",  "c":"REPL",     "id":5576601, "ctx":"initandlisten","msg":"Replication startup parameter 'startupRecoveryForRestore' is set and recovering from an unstable checkpoint.  Assuming this is a resume of an earlier attempt to recover for restore."}
      {"t":{"$date":"2021-10-22T00:33:41.178+00:00"},"s":"I",  "c":"REPL",     "id":21557,   "ctx":"initandlisten","msg":"Removing unapplied oplog entries after oplogTruncateAfterPoint","attr":{"oplogTruncateAfterPoint":{"":{"$timestamp":{"t":1634858228,"i":1}}}}}
      {"t":{"$date":"2021-10-22T00:33:41.178+00:00"},"s":"F",  "c":"-",        "id":23095,   "ctx":"initandlisten","msg":"Fatal assertion","attr":{"msgid":34418,"error":"NamespaceNotFound: Can't find local.oplog.rs","file":"src/mongo/db/repl/replication_recovery.cpp","line":764}}
      {"t":{"$date":"2021-10-22T00:33:41.178+00:00"},"s":"F",  "c":"-",        "id":23096,   "ctx":"initandlisten","msg":"\n\n***aborting after fassert() failure\n\n"
      
      

       

       

            Assignee:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Reporter:
            chuck.zhang@mongodb.com Chuck Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: