Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.3.1
Affects Version/s: None
Component/s: Sharding
Labels:
- sharding-wfbf-day

Backwards Compatibility:
Fully Compatible
Sprint:
Sharding 2019-11-18
Linked BF Score:
18
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

On StaleConfig errors, a secondary node triggers a refresh on the primary, which sets a "refreshing" flag in the config.cache.collections entry for the namespace being refreshed and then unsets it when the refresh completes. Before reading from its replicated metadata cache collections after triggering the refresh, the secondary will check if the "refreshing" flag is set to true, and if so, wait on a notification that is triggered by an op observer when an update to the cache entry for the refreshing namespace is replicated.

The op observer will only trigger the notification if the replicated update used a $set that both contains the "lastRefreshedCollectionVersion" field and set "refreshing" to false. The collection version is stored on disk without an epoch, so if a refresh discovers a collection's epoch changed, but not the timestamp component of its collection version (which can happen after a collection's shard key is refined), then the replicated update from the primary will omit "lastRefreshedCollectionVersion" and the secondary's op observer will not notify the refreshing thread, causing it to hang.

A possible fix is to change the op observer to only check that the "refreshing" field was set to false.

depends on

SERVER-34632 config.chunks change to config.cache.chunks creates a collection long name after upgrade

Backlog

related to

SERVER-43652 Secondary reads right after change shard key can stall on refresh

Closed

Assignee:: Jack Mulrow
Reporter:: Jack Mulrow
Participants:: Githook User, Jack Mulrow
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Sep 06 2019 09:44:56 PM UTC
Updated:: Oct 29 2023 10:17:23 PM UTC
Resolved:: Nov 12 2019 05:01:45 PM UTC
Confidence Status Last Update:: 11/Nov/19 5:55 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates