|
From a failed Evergreen run we have seen a case where un-setting of the 'refreshing' flag did not happen somehow on the secondaries, even though the refresh succeeded on the primaries.
It is extremely unlikely that the two writes in persistCollectionAndChangedChunks failed and without this, there is no explanation of why the notification here never got signaled.
We should change the wait loop in _getCompletePersistedMetadataForSecondarySinceVersion to have a timeout of a few seconds between loop and after which it should check whether the optime returned by the primary's refresh has been reached and assert that the refreshing flag has been cleared. If it hasn't been cleared, it should log the contents of the config.cache.collections entry for the collection being refreshed and failing the refresh (which will cause the client calls to fail).
This should help us build further hypothesis about this issue.
|