[SERVER-37053] Retry on network errors for commands run through the add shard task executor Created: 07/Sep/18  Updated: 17/Jun/19  Resolved: 17/Jun/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Jack Mulrow Assignee: Mira Carey
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Participants:
Linked BF Score: 28

 Description   

When adding a new shard, the config server sends several requests to the shard using a separate task executor, likeĀ isMaster, listDatabases, and a drop of the sessions collection. These requests are not retried on network errors, or any retryable errors, unlike commands run through the Shard interface.

It looks like prior to 3.4, a different code path was used for these requests, that did retry on retryable errors.



 Comments   
Comment by Mira Carey [ 17/Jun/19 ]

Closing this out as not worth the amount of work that would be required to fix this.

We already have commands in sharding with no retry policies. and addshard is already an uncommon command. Actual users issuing addshard will retry failures themselves, so the only reason to find a fix here would be to squash a few more bfs.

If the number of bfs rises substantially we could revisit this, but closing as wontfix for now

Generated at Thu Feb 08 04:44:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.