Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-43217

Secondaries can hang refreshing metadata if only collection's epoch changes

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.1
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Sharding 2019-11-18
    • Linked BF Score:
      18

      Description

      On StaleConfig errors, a secondary node triggers a refresh on the primary, which sets a "refreshing" flag in the config.cache.collections entry for the namespace being refreshed and then unsets it when the refresh completes. Before reading from its replicated metadata cache collections after triggering the refresh, the secondary will check if the "refreshing" flag is set to true, and if so, wait on a notification that is triggered by an op observer when an update to the cache entry for the refreshing namespace is replicated.

      The op observer will only trigger the notification if the replicated update used a $set that both contains the "lastRefreshedCollectionVersion" field and set "refreshing" to false. The collection version is stored on disk without an epoch, so if a refresh discovers a collection's epoch changed, but not the timestamp component of its collection version (which can happen after a collection's shard key is refined), then the replicated update from the primary will omit "lastRefreshedCollectionVersion" and the secondary's op observer will not notify the refreshing thread, causing it to hang. 

      A possible fix is to change the op observer to only check that the "refreshing" field was set to false.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jack.mulrow Jack Mulrow
              Reporter:
              jack.mulrow Jack Mulrow
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: