[SERVER-68154] Race in destruction of Resharding metrics can lead to calling pure virtual function Created: 19/Jul/22 Updated: 29/Oct/23 Resolved: 26/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Randolph Tan | Assignee: | Randolph Tan |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Apply repro.diff and run the cpp test |
||||||||||||
| Sprint: | Sharding 2022-07-11, Sharding 2022-07-25, Sharding 2022-08-08 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 135 | ||||||||||||
| Description |
|
Whenever ShardingDataTransformInstanceMetrics destructor is called, it will remove the observer to itself (here) from the ShardingDataTransformCumulativeMetrics. The issue is that ShardingDataTransformInstanceMetrics::getRecipientHighEstimateRemainingTimeMillis is a pure virtual function and by the time the destructor for ShardingDataTransformInstanceMetrics is being run, the child class is already destroyed (In production server, this is the ReshardingMetrics class). So there is a small window where the observer instance is still accessible and trying to call getRecipientHighEstimateRemaining on it will result in terminating the server. |
| Comments |
| Comment by Githook User [ 26/Jul/22 ] |
|
Author: {'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}Message: |
| Comment by Brett Nawrocki [ 21/Jul/22 ] |
|
It seems to me that fundamentally, either 1. the metrics need to be deregistered before the child class starts getting destroyed or 2. the metrics relevant to calculating the time estimate need to survive past the child class being destroyed. For approach 1, I agree that option A is the less bad choice. For approach 2, I think the resulting code would be more convoluted than its worth. |
| Comment by Randolph Tan [ 21/Jul/22 ] |
|
Making the function into non pure virtual appears to make the test pass. But I suspect that this only narrows the window for the race. From my understanding, the vtable is switched to pointing to ShardingDataTransformCumulativeMetrics's when it's own destructor is called, so there is still a tiny window where calls to getRecipientHighEstimateRemainingTimeMillis will attempt to call ReshardingMetrics's after it has started destroying it's own member variables. So I have 2 proposals in mind:
cons:
Option B: Remove _deregister from base class and child class is responsible for it's lifetime.
cons:
I'm currently leaning towards option A, what do you guys think? max.hirschhorn@mongodb.com, brett.nawrocki@mongodb.com |