Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38580

Tighten the check around when the `refreshing` flag is supposed to have been cleared on a secondary node

    XMLWordPrintableJSON

Details

    • Icon: Improvement Improvement
    • Resolution: Won't Do
    • Icon: Major - P3 Major - P3
    • None
    • None
    • Sharding
    • None
    • Sharding EMEA
    • Sharding 2018-12-31, Sharding 2019-01-14, Sharding 2019-02-25, Sharding 2019-03-11, Sharding 2019-03-25, Sharding 2019-05-06, Sharding 2019-05-20
    • 8

    Description

      From a failed Evergreen run we have seen a case where un-setting of the 'refreshing' flag did not happen somehow on the secondaries, even though the refresh succeeded on the primaries.

      It is extremely unlikely that the two writes in persistCollectionAndChangedChunks failed and without this, there is no explanation of why the notification here never got signaled.

      We should change the wait loop in _getCompletePersistedMetadataForSecondarySinceVersion to have a timeout of a few seconds between loop and after which it should check whether the optime returned by the primary's refresh has been reached and assert that the refreshing flag has been cleared. If it hasn't been cleared, it should log the contents of the config.cache.collections entry for the collection being refreshed and failing the refresh (which will cause the client calls to fail).

      This should help us build further hypothesis about this issue.

      Attachments

        Activity

          People

            backlog-server-sharding-emea [DO NOT USE] Backlog - Sharding EMEA
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: