[SERVER-49876] Fix always-false if statement in CatalogCache Created: 24/Jul/20  Updated: 27/Oct/23  Resolved: 26/Nov/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Blake Oler Assignee: Kaloian Manassiev
Resolution: Gone away Votes: 0
Labels: bkp, sharding-wfbf-sprint
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-49800 add use-after-move rule to clang-tidy... Closed
Operating System: ALL
Sprint: Sharding 2020-11-16, Sharding 2020-11-30, Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25, Sharding 2021-02-22, Sharding 2021-03-08, Sharding 2021-03-22, Sharding 2021-04-05, Sharding 2021-04-19, Sharding 2021-05-03, Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29
Participants:

 Description   

We check if the existingRoutingInfo exists when attempting to reset stale shards. It will never exist, because we've std::moved the shared pointer into refreshCollectionRoutingInfo.

We should figure out a way to achieve the desired behavior while maintaining the std::move semantics.



 Comments   
Comment by Kaloian Manassiev [ 26/Nov/21 ]

Given that this project has been disabled since 4.4 and we rewrote the caching layer in 5.0 to be built on top of the ReadThroughCache, I don't believe that this problem exists anymore, because before returning a new ChunkManager from the lookup function we do set all shards as refreshed.

Comment by Andrew Morrow (Inactive) [ 07/Jan/21 ]

blake.oler - I agree with kaloian.manassiev, in that I think you need to evaluate whether the fact that the logic in the conditional is unreachable represents a bug or not. What would happen if we failed to manually reset stale shards when we ought to, per the comment? If failing to do that manual reset is a bug with real consequences, then I think yes, this should be fixed on older branches.

Comment by Kaloian Manassiev [ 07/Jan/21 ]

As long as it is not a bug I wouldn't bother changing the older versions

Comment by Blake Oler [ 07/Jan/21 ]

This change no longer tracks on master because we've gotten rid of both the affected code and the notion of a sequence number. acm would you like us to target this fix for older versions, or just close as Gone Away?

Comment by Randolph Tan [ 08/Sep/20 ]

Change sounds good to me

Comment by Blake Oler [ 02/Sep/20 ]

renctan ping

Comment by Blake Oler [ 07/Aug/20 ]

We need the std::move, because there's the possibility that we will end up using the same ChunkManager pointer at the move-site when swapping in the new one.

To clarify, I meant that we can capture the sequence number locally before moving in the ChunkManager.

Comment by Randolph Tan [ 06/Aug/20 ]

Hm... in places were the old ChunkManager is not readily available, wouldn't you end up creating a new shared_ptr for the ChunkManager? Would it be better if we just removed the std::move?

Comment by Blake Oler [ 06/Aug/20 ]

Proposed solution:

In all places where we check for the existence of an already-moved old routing info, instead check if the old chunk manager's sequence number matches the new chunk manager's sequence number.

renctan to approve.

Comment by Blake Oler [ 24/Jul/20 ]

I think there's a similar bug in the ChunkManagerTargeter. Noting here for posterity's sake. It might also be worth analyzing for other places in the codebase where we attempt to take action based on the existence of a routing info shared pointer after a refresh has been done.

Comment by Blake Oler [ 24/Jul/20 ]

Changed the header acm

Comment by Andrew Morrow (Inactive) [ 24/Jul/20 ]

It isn't really a use-after-free though, right? The existingRoutingInfo shared pointer is just empty after the move, so the the conditional will always evaluate to false rendering the code in the if block effectively dead.

Generated at Thu Feb 08 05:21:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.