|
In HELP-54194, we discovered that there are some commands that may fail when a removeShard is taking place / draining a shard. As an example “listIndexes”. It is expected to hit ShardNotFound as a transient error triggered by a specific timing and in a specific window of time, and bubble up to the user application. The command failed can be perfectly retried and successfully executed after that. The exact reproducible test is attached to the comments.
The problem is that the user will be able to see ShardNotFound bubble up when it may not be necessary, i.e. the mongos or driver (implementation decision) should retry the operation.
Summarizing, the goal of this ticket is to list all the commands triggered by the reproducible and investigate / work on a feasible solution to retry ShardNotFound without bubbling up to the user when is not necessary - as we do with other transient errors.
|