Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38580

Tighten the check around when the `refreshing` flag is supposed to have been cleared on a secondary node

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Do
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
      None
    • Sharding EMEA
    • Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-05-06, Sharding 2019-05-20
    • 8

      From a failed Evergreen run we have seen a case where un-setting of the 'refreshing' flag did not happen somehow on the secondaries, even though the refresh succeeded on the primaries.

      It is extremely unlikely that the two writes in persistCollectionAndChangedChunks failed and without this, there is no explanation of why the notification here never got signaled.

      We should change the wait loop in _getCompletePersistedMetadataForSecondarySinceVersion to have a timeout of a few seconds between loop and after which it should check whether the optime returned by the primary's refresh has been reached and assert that the refreshing flag has been cleared. If it hasn't been cleared, it should log the contents of the config.cache.collections entry for the collection being refreshed and failing the refresh (which will cause the client calls to fail).

      This should help us build further hypothesis about this issue.

            Assignee:
            backlog-server-sharding-emea [DO NOT USE] Backlog - Sharding EMEA
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: