[SERVER-61127] Multi-writes may exhaust the number of retry attempts in the presence of ongoing chunk migrations Created: 30/Oct/21  Updated: 29/Oct/23  Resolved: 09/Jun/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.1.0-rc0, 6.0.8

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Jordi Serra Torrens
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Sprint: Sharding EMEA 2022-01-24, Sharding EMEA 2022-02-07, Sharding EMEA 2022-02-21, Sharding EMEA 2022-03-07, Sharding EMEA 2022-03-21, Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13
Participants:
Linked BF Score: 1

 Description   

Multi-writes in a sharded cluster (updateMany:true and justOne:false for deletes) do not perform version checking on account that they are broadcast to all nodes in the sharded cluster. Such operations attach the special value ChunkVersion::IGNORED to indicate that an operation is coming from a router (as opposed to direct connection to a shard), but that the shard must not perform version checking, under the assumption that the caller knows what they are doing.

However, ChunkVersion::IGNORED still triggers a StaleShardVersion exception in the case where the shardVersion is UNKNOWN or if the shard is in a critical section.

The former is not a big problem, since it only happens once for the duration of a shard's MongoD process, but the latter is problematic since it may exhaust the 10 retry attempts that we allow on the router.

This ticket is to come-up with a scheme so that multi-writes' StaleShardVersion exceptions be retried at the level of the shard and not bubble up all the way up to the router.



 Comments   
Comment by Githook User [ 29/Jun/23 ]

Author:

{'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}

Message: SERVER-61127 Retry multi-writes that hit StaleConfig due to critical section on the shard

(cherry picked from commit 824b9b7e608687ba0db7af2d5ccc5b6811a46720)
Branch: v6.0
https://github.com/mongodb/mongo/commit/3d84c0dd4e5d99be0d69003652313e7eaf4cdd74

Comment by Githook User [ 09/Jun/22 ]

Author:

{'name': 'Jordi Serra Torrens', 'email': 'jordi.serra-torrens@mongodb.com', 'username': 'jordist'}

Message: SERVER-61127 Retry multi-writes that hit StaleConfig due to critical section on the shard
Branch: master
https://github.com/mongodb/mongo/commit/824b9b7e608687ba0db7af2d5ccc5b6811a46720

Comment by Cris Insignares Cuello [ 02/Mar/22 ]

na

Generated at Thu Feb 08 05:51:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.