Make implicitly_retry_resharding.js also retry on snapshot or oplog missing errors that got converted to ReshardCollectionTruncatedError

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 8.3.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Cluster Scalability
    • Fully Compatible
    • ALL
    • ClusterScalability 19Jan-2Feb
    • 200
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      SERVER-112211 added a js override for retrying resharding commands on snapshot or oplog truncated errors. It turns out in some cases the errors are converted to ReshardCollectionTruncatedError and the original error is included in the "errmsg" instead.

       

      {
      "ok" : 0,
      "errmsg" : "Recipient shard shard-rs0 reached an unrecoverable error :: caused by :: OplogQueryMinTsMissing: Executor error during aggregate command on namespace: local.oplog.rs :: caused by :: Specified timestamp has already fallen off the oplog for the input timestamp: Timestamp(1766637357, 138), first oplog entry: { lsid: { id: UUID(\"c75cb0f8-5049-4fe6-9f2a-6c2c1875705c\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 4, op: \"c\", ns: \"admin.$cmd\", o: { applyOps: [ { op: \"i\", ns: \"test.system.resharding.4eb881b2-260c-4d31-8c14-15ad30e49b33\", ui: UUID(\"8b4728b1-d1bf-4177-9988-4a14ead4368e\"), o: { _id: ObjectId('694cbf2d0323f618751382ab'), oldKey: 200.0, newKey: 300.0 }, o2: { newKey: 300.0, _id: ObjectId('694cbf2d0323f618751382ab') } }, { op: \"i\", ns: \"test.system.resharding.4eb881b2-260c-4d31-8c14-15ad30e49b33\", ui: UUID(\"8b4728b1-d1bf-4177-9988-4a14ead4368e\"), o: { _id: ObjectId('694cbf2d0323f618751382ac'), oldKey: 201.0, newKey: 299.0 }, o2: { newKey: 299.0, _id: ObjectId('694cbf2d0323f618751382ac') } }, { op: \"i\", ns: \"test.system.resharding.4eb881b2-260c-4d31-8c14-15ad30e49b33\", ui: UUID(\"8b4728b1-d1bf-4177-9988-4a14ead4368e\"), o: { _id: ObjectId('694cbf2d0323f618751382ad'), oldKey: 202.0, newKey: 298.0 }, o2: { newKey: 298.0, _id: ObjectId('694cbf2d0323f618751382ad') } }, { op: \"i\", ns: \"test.system.resharding.4eb881b2-260c-4d31-8c14-15ad30e49b33\", ui: UUID(\"8b4728b1-d1bf-4177-9988-4a14ead4368e\"), o: { _id: ObjectId('694cbf2d0323f618751382ae'), oldKey: 203.0, newKey: 297.0 }, o2: { newKey: 297.0, _id: ObjectId('694cbf2d0323f618751382ae') } }, { op: \"i\", ns: \"test.system.resharding.4eb881b2-260c-4d31-8c14-15ad30e49b33\", ui: UUID(\"8b4728b1-d1bf-4177-9988-4a14ead4368e\"), o: { _id: ObjectId('694cbf2d0323f618751382af'), oldKey: 204.0, newKey: 296.0 }, o2: { newKey: 296.0, _id: ObjectId('694cbf2d0323f618751382af') } }, { op: \"i\", ns: \"test.system.resharding.4eb881b2-260c-4d31-8c14-15ad30e49b33\", ui: UUID(\"8b4728b1-d1bf-4177-9988-4a14ead4368e\"), o: { _id: ObjectId('694cbf2d0323f618751382b0'), oldKey:",
      "code" : 350,
      "codeName" : "ReshardCollectionTruncatedError",
      "$clusterTime" : {
      "clusterTime" : Timestamp(1766637358, 50),
      "signature" : {
      "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      "keyId" : NumberLong(0)
      }
      

       

      Given this, we should make the override also retry on this error if the "errmsg" contains one of these errors

            Assignee:
            Cheahuychou Mao
            Reporter:
            Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: