[SERVER-50189] Replace ReplicaSetMonitorManager::removeMonitor with a custom deleter for shared_ptr<ReplicaSetMonitor> Created: 07/Aug/20  Updated: 12/Dec/23

Status: Backlog
Project: Core Server
Component/s: Replication, Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Backlog - Cluster Scalability
Resolution: Unresolved Votes: 0
Labels: sharding-common-backlog, sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-51440 Cleanup refcounting and lifetime of R... Closed
Related
related to SERVER-50093 Test that donor supports running mult... Closed
related to SERVER-50467 Ensure that tenant migration donor on... Closed
related to SERVER-51442 ShardRegistry should handle gracefull... Closed
related to SERVER-51443 Server should cleanup the ReplicaSetM... Closed
related to SERVER-51440 Cleanup refcounting and lifetime of R... Closed
Assigned Teams:
Cluster Scalability
Sprint: Sharding 2020-08-24
Participants:
Story Points: 3

 Description   

ReplicaSetMonitorManager stores ReplicaSetMonitors by weak_ptr so that the ReplicaSetMonitor is destroyed when the last shared_ptr reference to it goes out of scope.

However, ReplicaSetMonitorManager also has a removeMonitor function, which currently the holder of the last shared_ptr reference has to call to remove the entry for the ReplicaSetMonitor from the ReplicaSetMonitorManager's map.

Instead, the shared_ptr to ReplicaSetMonitor could be created with a custom deleter that removes the entry from the map, like:

        std::shared_ptr<Foo> sh5(new Foo, [](auto p) {
           std::cout << "Call delete from lambda...\n";
           delete p;
        });

See also https://en.cppreference.com/w/cpp/memory/shared_ptr/shared_ptr.

There are a couple complications:

  1. We probably still want to remove the hosts from the connection pool, which ReplicaSetMonitor::remove currently does.
  2. There are two implementations of ReplicaSetMonitor, and the Streamable implementation has a factory method to create its monitors.


 Comments   
Comment by Andrew Shuvalov (Inactive) [ 08/Oct/20 ]

I'm trying this again but with only minimal changes:

  1. just remove the circular dependency by fixing the TopologyListenerPtr to weak pointer
  2. do not use custom deleter because it will be lost when 'shared_from_this()' is invoked

And I expect a lot of tests to fail, which could be hard to fix but unfortunately cleaning up the circular dependency is inevitable one day anyway

 

Test failures so far:

https://spruce.mongodb.com/version/5f7f846c3e8e864890e64103/tasks 

https://logkeeper.mongodb.org/lobster/build/4f137323c6d502618c61ba4f27fcc070/test/5f7f9c6abe07c43e035aab6e#bookmarks=0%2C66418&l=1

https://logkeeper.mongodb.org/lobster/build/9112ca0fd22ae6e23222cd8a06e33584/test/5f7f9e5f9041306a554a92db#l=1 

 

The error is during the shutdown, key log entry:
[js_test:migration_coordinator_abort_failover] 2020-10-09T03:23:45.085+0000 s20027| 2020-10-09T03:23:45.084+00:00 W SHARDING 22667 [SignalHandler] "Error cleaning up distributed ping entry","attr":

{"processId":"ip-10-122-13-53:20027:1602213643:-6029069255033399232","error":"ReplicaSetMonitorRemoved: ReplicaSetMonitor for set migration_coordinator_abort_failover-configRS is removed"}

 

Generated at Thu Feb 08 05:22:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.