Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.0.0-rc0
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding EMEA 2022-03-07, Sharding EMEA 2022-03-21
Linked BF Score:
32

When a shard starts, if the sharding state recovery document indicates that were metadata change operations in flight, it contacts the primary config server in order to retrive the most recent opTime.

This procedure should retry until it succeeds, but there is a corner case causing the shard process to crash: when the returned command status is NamespaceExists (perfectly expected scenario), the logic also checks the write concern status and possibly raises an error. If the primary config server stepped down, the write concerne status would be InterruptedDueToReplStateChange, the error is converted to an exception by the caller and process crashes.

A possible solution would be to retry the command for the primary config server when the write conversion status is not ok and the command status is part of a specific list of errors (that includes NamespaceExists).

Assignee:: Allison Easton

Reporter:: Antonio Fuschetto

Participants:: Allison Easton, Antonio Fuschetto, Githook User

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: Feb 11 2022 06:14:06 PM UTC

Updated:: Oct 29 2023 09:42:36 PM UTC

Resolved:: Mar 07 2022 02:42:10 PM UTC

Details

Description

Attachments

Activity

People

Dates