Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47233

WriteOp can be left in pending state, leading to erroneous NoProgressMade write error from mongos

    XMLWordPrintable

    Details

      Description

      The changes from d591387 as part of SERVER-45100 made it so writes are no longer re-issued to shards we've already received a successful response from.

      For updateOne by _id operations which target all shards that may own data for the collection, this can lead to a WriteOp being put in the WriteOpState_Pending state without scheduling more work on the ARS. Since BatchWriteOp::noteBatchResponse() must be called to transition the WriteOp to the WriteOpState_Completed or WriteOpState_Error states and that only happens when getting a response from the ARS, the WriteOp is left stranded in the WriteOpState_Pending state until it exhausts the numRoundsWithoutProgress counter.

      1. Collection is sharded with all chunks on shard0. (E.g. when using range-based sharding)
      2. Chunk migration from shard0 to shard1 begins (but committing it doesn't complete for the remainder of these steps).
      3. UpdateOne by _id operation targets both shard0 and shard1.
      4. MongoS receives acknowledgement of successful update from shard0.
      5. MongoS receives StaleShardVersion error response from shard1.
      6. MongoS re-targets and only considers resending updateOne by _id operation to shard0 because shard1 doesn't yet own any chunks.
      7. MongoS doesn't resend the updateOne by _id to shard0 because it has already got a successful response from shard0. WriteOp is put into the WriteOpState_Pending state without there being any more requests to send.
      8. MongoS reports a write result with a NoProgressMade error:

      {
        ok: 1,
        nModified: 1,
        n: 1,
        writeErrors: [
          {
            index: 0,
            code: 82,
            codeName: "NoProgressMade",
            errmsg: "no progress was made executing batch write op in test2_fsmdb0.fsmcoll0 after 5 rounds (0 ops completed in 6 rounds total)"
          }
        ]
      }
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              marcos.grillo Marcos José Grillo Ramirez
              Reporter:
              max.hirschhorn Max Hirschhorn
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: