Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 6.0.0-rc0, 5.0.7, 5.3.0-rc2, 5.2.2
Affects Version/s: 5.2.0, 5.3.0-rc0, 5.0.5
Component/s: Sharding
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v5.3, v5.2, v5.0
Sprint:
Sharding EMEA 2022-02-21
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

By design, it was assumed that rename collection participants could only fail due to stepdown/shutdown errors, so the rationale was: promises would be invalidated before releasing POS instances and eventually new primaries would resume such instances with a clean state.

However, it turns out that there are some scenarios in which "non-stepdown" recoverable errors can happen, meaning that promises get invalidated but POS participant instances do not get released. As a consequence, any retry results in the following flow: get the POS instance, check the promises and fail again.

This can happen for example in case of index builds happening concurrently on a collection being renamed (participants get stuck with BackgroundOperationInProgressForNamespace error).

—

Workaround in case this bug is hit by some users: trigger an election on all shards with a stuck rename participant.

Assignee:: Pierlauro Sciarelli
Reporter:: Pierlauro Sciarelli
Participants:: Githook User, Pierlauro Sciarelli
Votes:: 0 Vote for this issue
Watchers:: 5 Start watching this issue

Created:: Feb 16 2022 11:14:35 AM UTC
Updated:: Oct 29 2023 09:42:23 PM UTC
Resolved:: Feb 18 2022 01:24:10 PM UTC
Confidence Status Last Update:: 16/Feb/22 3:35 PM

Details

Description

Attachments

Forms

Activity

People

Dates