[SERVER-47210] The StaleShardVersion error response for {ordered:false} writes contains a lot of repeated information Created: 31/Mar/20  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.17, 4.2.5, 4.0.17
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-46981 The MongoS write commands scheduler d... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Participants:

 Description   

If an unordered (ordered:false) batch encounters a routing error (specifically StaleShardVersion), the error response returned to the router will contain at least a BSON object of this size for each operation in the batch, which did not get executed:

{ index: 0,
  code: 63,
  codeName: \"StaleShardVersion\",
  errmsg: \"epoch mismatch detected for foo.bar\",
  errInfo: { ns: \"foo.bar\",
                  vReceived: Timestamp(1, 0), vReceivedEpoch: ObjectId('5e8378bff739365807792086'),
                  vWanted: Timestamp(2, 0), vWantedEpoch: ObjectId('5e8378bff739365807792086'),
                 shardId: \"Shard0001\" } }

This effectively means that if a large bulk insert for example is sent to a shard after chunk migration, the entire write will fail with a BSONObjTooLarge error and the error will be propagated to the client. Furthermore, this is problematic for the $out stage, which uses batch sizes of 100,000 and is susceptible to this problem.

This issue will be worked around under SERVER-46981, so it is not an urgent problem. This ticket is about improving the unordered write error responses to not be proportional to the size of the input batch.



 Comments   
Comment by Kaloian Manassiev [ 29/Jun/21 ]

Actually the infrastructure is all in there for retrying the batches or subsets of them (which BTW, we were considering that you guys should start owning at some point since it is essentially a scheduler like ClusterFind ).

This ticket is more around the fact that transporting back a state which indicates "you need to retarget 99,999 entries from this 100,000 entry batch" repeats a bunch of strings 99,999 times. Due to how unordered write commands are currently designed, there is no other way to achieve it and we cheated a bit by excluding the longer strings from all but the first error in SERVER-46981.

Comment by Kyle Suarez [ 29/Jun/21 ]

kaloian.manassiev, if we get StaleShardVersion, is it safe to fail the entire batch? Is there a reason this behavior has become more of a problem recently?

Comment by Kaloian Manassiev [ 29/Jun/21 ]

This is fundamental to how write commands return errors to MongoS (the number of errors is equal to the number of batch entries). Will pass it on to the Query Execution team in case there are some plans to improve that API.

Generated at Thu Feb 08 05:13:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.