-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Internal Code
-
Server Programmability
-
ALL
-
-
Service Arch 2022-08-08, Service Arch 2022-08-22, Service Arch 2022-09-05, Service Arch 2022-09-19
-
0
This may apply to other tests in primary_only_service_test.cpp, but at least one of the tests (i.e., RecreateInstanceOnStepUp) may fail due to a race between the thread that is completing stepUp for a POS instance and another thread that attempts to create an opCtx on the service (see here):
try { auto opCtx = cc().makeOperationContext(); ... } catch (const DBException& e) { _documentWriteException.setError(e.toStatus()); throw; }
The operations are interrupted with NotWritablePrimary if they are created when the POS instance is still rebuilding. Adding the following line before making the OperationContext fixes the data-race, but may not be desirable for the actual fix:
try { AllowOpCtxWhenServiceRebuildingBlock allowOpCtxBlock(Client::getCurrent()); auto opCtx = cc().makeOperationContext(); ... } catch (const DBException& e) { _documentWriteException.setError(e.toStatus()); throw; }
This ticket should propose a fix that ensures the operations are not interrupted, either by strictly ordering stepUp and construction of opCtx, or using the earlier suggestion.