Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43217

Secondaries can hang refreshing metadata if only collection's epoch changes

    XMLWordPrintableJSON

Details

    • Fully Compatible
    • Sharding 2019-11-18
    • 18

    Description

      On StaleConfig errors, a secondary node triggers a refresh on the primary, which sets a "refreshing" flag in the config.cache.collections entry for the namespace being refreshed and then unsets it when the refresh completes. Before reading from its replicated metadata cache collections after triggering the refresh, the secondary will check if the "refreshing" flag is set to true, and if so, wait on a notification that is triggered by an op observer when an update to the cache entry for the refreshing namespace is replicated.

      The op observer will only trigger the notification if the replicated update used a $set that both contains the "lastRefreshedCollectionVersion" field and set "refreshing" to false. The collection version is stored on disk without an epoch, so if a refresh discovers a collection's epoch changed, but not the timestamp component of its collection version (which can happen after a collection's shard key is refined), then the replicated update from the primary will omit "lastRefreshedCollectionVersion" and the secondary's op observer will not notify the refreshing thread, causing it to hang. 

      A possible fix is to change the op observer to only check that the "refreshing" field was set to false.

      Attachments

        Activity

          People

            jack.mulrow@mongodb.com Jack Mulrow
            jack.mulrow@mongodb.com Jack Mulrow
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: