[SERVER-29386] VoteRequestor and FreshnessChecker need safe destruction Created: 26/May/17  Updated: 30/Oct/23  Resolved: 28/Aug/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.5.13

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Siyuan Zhou
Resolution: Fixed Votes: 0
Labels: neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-28389 Pass CallbackCanceled error down to s... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Repl 2017-07-31, Repl 2017-08-21, Repl 2017-09-11
Participants:
Linked BF Score: 15

 Description   

The VoteRequester and FreshnessChecker can be destroyed with callbacks still pending on their ScatterGatherRunner. Since the callbacks access the ScatterGatherRunner::Algorithm owned by the VoteRequestor/FreshnesssChecker, this can result in various bad behavior (crashes or memory corruption leading to just about anything). The ScatterGatherRunner::Algorithm needs to live until all callbacks are complete.

This would probably make sense to combine with SERVER-28389



 Comments   
Comment by Githook User [ 28/Aug/17 ]

Author:

{'username': 'visualzhou', 'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com'}

Message: SERVER-29386 VoteRequestor and FreshnessChecker need safe destruction
Branch: master
https://github.com/mongodb/mongo/commit/c246ae62641c3559c38830f6f5f4981e0acffa0c

Comment by Andy Schwerin [ 31/May/17 ]

I'm not sure whether or not the implementation can own the algorithm object, or if it needs to share ownership with the user of the ScatterGatherRunner. This is probably a fine place to use a shared_ptr<Algorithm>, if that's an easy approach to fixing the problem.

Comment by Matthew Russotto [ 31/May/17 ]

Ah, I missed that subtlety. Yes, it is the algorithm (which is owned by the VoteRequestor or the FreshnessChecker) which is the issue. The callbacks call the algorithm, which can be destroyed at any time. In practice it's a little unlikely because the callbacks get canceled (and currently the canceled version does not reference the algorithm though SERVER-28389 woould change that). However, there's still a window where a callback has already started before it gets canceled, and then the algorithm gets destroyed while that callback is still running. Perhaps the implementation needs to own the algorithm, keeping it safe under the shared pointer.

Comment by Andy Schwerin [ 31/May/17 ]

The ScatterGatherRunner callbacks aren't supposed to have references to any of its member variables. Instead, they have a shared pointer to the ScatterGatherRunnerImpl, whose lifetime thus lasts until the callbacks complete. At least, that's the design. Perhaps the problem is that owners of the ScatterGatherRunner believe it is safe to destroy the Algorithm object they own after the ScatterGatherRunner is destroyed? That's notably held by bare pointer, so I would believe that it is somehow responsible for this problem.

Generated at Thu Feb 08 04:20:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.