-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
-
Atlas Streams
-
Fully Compatible
-
ALL
-
Sprint 67
-
200
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
The runChangeStreamSourceTestFailOnStart function has a timing race condition where it depends on the processor start time failure on the internal I/O thread to happen and be detected by the executor thread before the start RPC response is sent back.
The ChangeStreamSource's I/O thread marks itself as having connected after the initial connection establishment here. And once the executor sees this connected status, it sets itself to be connected() and this status is not changed even if the executor later runs into an error. However, if before the executor gets to the point of checking if the source is connected, if the source I/O thread runs into the start-time error (as expected by these tests in question), then, the executor will not mark itself as connected.
So basically, depending on the above timing situation, when the SP start sequence checks if the executor is connected or not, it can get one of two answers and so the test can sometimes fail or sometimes succeed.
To make this more robust we should testin that the SP eventually enters a error/stopped state instead of expecting the start request itself to fail/succeed