[SERVER-40174] PeriodicBalancerConfigRefresher is not thread-safe since it puts its PeriodicJobs on the ServiceContext's PeriodicRunner Created: 15/Mar/19 Updated: 08/Jan/24 Resolved: 19/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.1.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Service Arch
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
The PeriodicBalancerConfigRefresher puts its PeriodicJob on the ServiceContext's PeriodicRunner, and PeriodicJobs on the ServiceContext's PeriodicRunner can be modified directly by shutdown. The PeriodicBalancerConfigRefresher is also modified on stepdown. Since shutdown and stepdown can occur concurrently, and PeriodicJob has invariants that its methods are called in a specific order, either PeriodicBalancerConfigRefresher needs to add synchronization on top of its PeriodicJob, or PeriodicJob should have more lax invariants. |
| Comments |
| Comment by Esha Maharishi (Inactive) [ 19/Mar/19 ] |
|
matthew.saltz oh, yes, it is. Closing as a dupe. |
| Comment by Matthew Saltz (Inactive) [ 19/Mar/19 ] |
|
I think this is a duplicate of |
| Comment by Kaloian Manassiev [ 15/Mar/19 ] |
|
Since the PeriodicRunner is a generic server utility, it doesn't seem right to be owned by the sharding team. I am moving this ticket to Service Arch. I remember that matthew.saltz worked on it so if this is a regression from that work we can pick it up, but otherwise I don't want sharding to be supporting it. |
| Comment by Mira Carey [ 15/Mar/19 ] |
|
I think it's how it was intended to work (for a first cut, not having to deal with race conditions on the handles is easier to reason about), but I'm okay with loosening things a bit. I'd probably make pause() on a PAUSED task a noop, along with resume() on a RUNNING job, and throw for the other illegal transitions |
| Comment by Esha Maharishi (Inactive) [ 15/Mar/19 ] |
|
CC mira.carey@mongodb.com, is this the way PeriodicRunner/PeriodicJobs are intended to work? |