[SERVER-43217] Secondaries can hang refreshing metadata if only collection's epoch changes Created: 06/Sep/19  Updated: 29/Oct/23  Resolved: 12/Nov/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Task Priority: Major - P3
Reporter: Jack Mulrow Assignee: Jack Mulrow
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-34632 config.chunks change to config.cache.... Backlog
Related
related to SERVER-43652 Secondary reads right after change sh... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2019-11-18
Participants:
Linked BF Score: 18

 Description   

On StaleConfig errors, a secondary node triggers a refresh on the primary, which sets a "refreshing" flag in the config.cache.collections entry for the namespace being refreshed and then unsets it when the refresh completes. Before reading from its replicated metadata cache collections after triggering the refresh, the secondary will check if the "refreshing" flag is set to true, and if so, wait on a notification that is triggered by an op observer when an update to the cache entry for the refreshing namespace is replicated.

The op observer will only trigger the notification if the replicated update used a $set that both contains the "lastRefreshedCollectionVersion" field and set "refreshing" to false. The collection version is stored on disk without an epoch, so if a refresh discovers a collection's epoch changed, but not the timestamp component of its collection version (which can happen after a collection's shard key is refined), then the replicated update from the primary will omit "lastRefreshedCollectionVersion" and the secondary's op observer will not notify the refreshing thread, causing it to hang. 

A possible fix is to change the op observer to only check that the "refreshing" field was set to false.



 Comments   
Comment by Githook User [ 12/Nov/19 ]

Author:

{'username': 'jsmulrow', 'email': 'jack.mulrow@mongodb.com', 'name': 'Jack Mulrow'}

Message: SERVER-43217 Secondaries should ignore lastCollectionVersion field when waiting for a refresh to finish
Branch: master
https://github.com/mongodb/mongo/commit/25101f418ecc42d62339ad5e219177a6822bd59b

Generated at Thu Feb 08 05:02:34 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.