[SERVER-37053] Retry on network errors for commands run through the add shard task executor Created: 07/Sep/18 Updated: 17/Jun/19 Resolved: 17/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jack Mulrow | Assignee: | Mira Carey |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Participants: | |||||
| Linked BF Score: | 28 | ||||
| Description |
|
When adding a new shard, the config server sends several requests to the shard using a separate task executor, likeĀ isMaster, listDatabases, and a drop of the sessions collection. These requests are not retried on network errors, or any retryable errors, unlike commands run through the Shard interface. It looks like prior to 3.4, a different code path was used for these requests, that did retry on retryable errors. |
| Comments |
| Comment by Mira Carey [ 17/Jun/19 ] |
|
Closing this out as not worth the amount of work that would be required to fix this. We already have commands in sharding with no retry policies. and addshard is already an uncommon command. Actual users issuing addshard will retry failures themselves, so the only reason to find a fix here would be to squash a few more bfs. If the number of bfs rises substantially we could revisit this, but closing as wontfix for now |