[SERVER-40174] PeriodicBalancerConfigRefresher is not thread-safe since it puts its PeriodicJobs on the ServiceContext's PeriodicRunner Created: 15/Mar/19  Updated: 08/Jan/24  Resolved: 19/Mar/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.1.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Backlog - Service Architecture
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-38679 Race between PeriodicBalancerConfigRe... Closed
Assigned Teams:
Service Arch
Operating System: ALL
Participants:

 Description   

The PeriodicBalancerConfigRefresher puts its PeriodicJob on the ServiceContext's PeriodicRunner, and PeriodicJobs on the ServiceContext's PeriodicRunner can be modified directly by shutdown.

The PeriodicBalancerConfigRefresher is also modified on stepdown.

Since shutdown and stepdown can occur concurrently, and PeriodicJob has invariants that its methods are called in a specific order, either PeriodicBalancerConfigRefresher needs to add synchronization on top of its PeriodicJob, or PeriodicJob should have more lax invariants.



 Comments   
Comment by Esha Maharishi (Inactive) [ 19/Mar/19 ]

matthew.saltz oh, yes, it is. Closing as a dupe.

Comment by Matthew Saltz (Inactive) [ 19/Mar/19 ]

I think this is a duplicate of SERVER-38679, which I'm working on already. Marking it as such. esha.maharishi if you want to take a look to verify and make sure I'm not missing something, feel free to do so.

Comment by Kaloian Manassiev [ 15/Mar/19 ]

Since the PeriodicRunner is a generic server utility, it doesn't seem right to be owned by the sharding team. I am moving this ticket to Service Arch. I remember that matthew.saltz worked on it so if this is a regression from that work we can pick it up, but otherwise I don't want sharding to be supporting it.

Comment by Mira Carey [ 15/Mar/19 ]

I think it's how it was intended to work (for a first cut, not having to deal with race conditions on the handles is easier to reason about), but I'm okay with loosening things a bit.

I'd probably make pause() on a PAUSED task a noop, along with resume() on a RUNNING job, and throw for the other illegal transitions

Comment by Esha Maharishi (Inactive) [ 15/Mar/19 ]

CC mira.carey@mongodb.com, is this the way PeriodicRunner/PeriodicJobs are intended to work?

Generated at Thu Feb 08 04:54:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.