Details
-
Bug
-
Resolution: Unresolved
-
Major - P3
-
None
-
None
-
None
-
Service Arch
-
ALL
-
-
Service Arch 2022-08-08, Service Arch 2022-08-22, Service Arch 2022-09-05, Service Arch 2022-09-19
-
5
Description
This may apply to other tests in primary_only_service_test.cpp, but at least one of the tests (i.e., RecreateInstanceOnStepUp) may fail due to a race between the thread that is completing stepUp for a POS instance and another thread that attempts to create an opCtx on the service (see here):
try { |
auto opCtx = cc().makeOperationContext();
|
...
|
} catch (const DBException& e) { |
_documentWriteException.setError(e.toStatus());
|
throw; |
}
|
The operations are interrupted with NotWritablePrimary if they are created when the POS instance is still rebuilding. Adding the following line before making the OperationContext fixes the data-race, but may not be desirable for the actual fix:
try { |
AllowOpCtxWhenServiceRebuildingBlock allowOpCtxBlock(Client::getCurrent());
|
auto opCtx = cc().makeOperationContext();
|
...
|
} catch (const DBException& e) { |
_documentWriteException.setError(e.toStatus());
|
throw; |
}
|
This ticket should propose a fix that ensures the operations are not interrupted, either by strictly ordering stepUp and construction of opCtx, or using the earlier suggestion.