Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Sharding
Labels:
None

Assigned Teams:

Sharding EMEA
Sprint:
Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-05-06, Sharding 2019-05-20
Linked BF Score:
8
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

From a failed Evergreen run we have seen a case where un-setting of the 'refreshing' flag did not happen somehow on the secondaries, even though the refresh succeeded on the primaries.

It is extremely unlikely that the two writes in persistCollectionAndChangedChunks failed and without this, there is no explanation of why the notification here never got signaled.

We should change the wait loop in _getCompletePersistedMetadataForSecondarySinceVersion to have a timeout of a few seconds between loop and after which it should check whether the optime returned by the primary's refresh has been reached and assert that the refreshing flag has been cleared. If it hasn't been cleared, it should log the contents of the config.cache.collections entry for the collection being refreshed and failing the refresh (which will cause the client calls to fail).

This should help us build further hypothesis about this issue.

Assignee:: [DO NOT USE] Backlog - Sharding EMEA
Reporter:: Kaloian Manassiev
Participants:: [DO NOT USE] Backlog - Sharding EMEA, Kaloian Manassiev
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Dec 12 2018 09:34:09 PM UTC
Updated:: Dec 06 2022 03:10:49 AM UTC
Resolved:: Feb 18 2022 08:10:44 AM UTC

Details

Description

Attachments

Activity

People

Dates