Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-47210

The StaleShardVersion error response for {ordered:false} writes contains a lot of repeated information

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.6.17, 4.2.5, 4.0.17
    • Component/s: Sharding
    • None
    • Query Execution
    • ALL

      If an unordered (ordered:false) batch encounters a routing error (specifically StaleShardVersion), the error response returned to the router will contain at least a BSON object of this size for each operation in the batch, which did not get executed:

      { index: 0,
        code: 63,
        codeName: \"StaleShardVersion\",
        errmsg: \"epoch mismatch detected for foo.bar\",
        errInfo: { ns: \"foo.bar\",
                        vReceived: Timestamp(1, 0), vReceivedEpoch: ObjectId('5e8378bff739365807792086'),
                        vWanted: Timestamp(2, 0), vWantedEpoch: ObjectId('5e8378bff739365807792086'),
                       shardId: \"Shard0001\" } }
      

      This effectively means that if a large bulk insert for example is sent to a shard after chunk migration, the entire write will fail with a BSONObjTooLarge error and the error will be propagated to the client. Furthermore, this is problematic for the $out stage, which uses batch sizes of 100,000 and is susceptible to this problem.

      This issue will be worked around under SERVER-46981, so it is not an urgent problem. This ticket is about improving the unordered write error responses to not be proportional to the size of the input batch.

            Assignee:
            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: