[SERVER-48650] Unit tests' ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid Created: 08/Jun/20 Updated: 29/Oct/23 Resolved: 28/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0, 4.4.5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Benjamin Caimano (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | servicearch-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||||||||||||||||||
| Sprint: | Service arch 2020-06-29, Service arch 2020-07-13, Service Arch 2020-07-27, Service Arch 2020-08-10 | ||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||
| Description |
|
:tldr The ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid; while also being depended upon by any code with a mutex under contention via the DiagnosticInfo code. There's a rare (so far) BF dependent on a SERVER ticket linked a couple hops from this one. -------------------------------------------------------------------------- I tried to setup the JournalFlusher thread in our unit test test fixtures, specifically the ServiceContexMongoDTest test fixture. However, I ran into problems like so,
It turns out that a contend mutex acquisition goes into the DiagnosticInfo code, which in turn call into the NetworkInterfaceMockClockSource and tries to use the NetworkInterfaceMock. However, the JournalFlusher (which has a mutex) found the NetworkInterfaceMock to be invalid memory: the NetworkInterfaceMock was already destroyed. The test fixture of the failing test is ReplCoordTest. The repl test fixture creates a NetworkInterfaceMock instances, passes ownership of the NetworkInterface into a ThreadPoolTaskExecutor instance, whose ownership is then passed into a ReplicationCoordinatorImpl instance. Meanwhile, NetworkInterfaceMockClockSource instances are created, with pointers to the same NetworkInterfaceMock owned by the ReplicationCoordinatorImpl, and set on the ServiceContext. So things like the JournalFlusher that use Mutexes are now dependent on the the lifetime of the ReplicationCoordinatorImpl. The ServiceContext's NetworkInterfaceMockClockSource cannot continue to depend upon the lifetime of the ReplicationCoordinator to remain valid; while also being depended upon by any code with a mutex under contention. |
| Comments |
| Comment by Githook User [ 23/Feb/21 ] |
|
Author: {'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}Message: (cherry picked from commit 214379825c248f5a5e5f0a01ad9863b900faaf30) |
| Comment by Benjamin Caimano (Inactive) [ 28/Jul/20 ] |
|
dianna.hohensee, I've landed code that removes the NetworkInterfaceMockClockSource in favor of an immortal ClockSourceMock implementation. Hopefully this should unblock you! Let me know if I missed the mark somehow. |
| Comment by Githook User [ 28/Jul/20 ] |
|
Author: {'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}Message: |
| Comment by Benjamin Caimano (Inactive) [ 09/Jun/20 ] |
|
In my mind, there are two parts to this:
|