[SERVER-51055] ServiceContext getters/setters are racing Created: 18/Sep/20 Updated: 12/Dec/23 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Billy Donahue | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | sa-remove-fv-backlog-22 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Steps To Reproduce: | BF-18712 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Service arch 2020-10-05, Service arch 2020-10-19, Service Arch 2021-03-08, Service Arch 2021-03-22, Service Arch 2021-04-05, Service Arch 2021-04-19, Service Arch 2021-05-17, Service Arch 2021-07-12, Service Arch 2021-08-09 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 5 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Story Points: | 4 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
BF-18712 shows a TSAN true positive where setFastClockSource races with getFastClockSource due to latch_detail::Mutex::_onContendedLock's invocation of getFastClockSource before the setFastClockSource could occur. There is no synchronization on the unique_ptr fields of ServiceContext, so these problems are unavoidable. Rather than making new obscure rules about when it's ok to call the getters, we can rather easily add synchronization to these fields, as they are private members. They could be stored in a synchronized unique_ptr wrapper.
Acceptance Criteria: ServiceContext will call terminate if a setter happens after a given property has been accessed. Create a patch build to reproduce this and identify all the use cases where this was happening to create tickets that require a fix. |
| Comments |
| Comment by Billy Donahue [ 30/Sep/20 ] |
|
Exploratory work to see if the setters can be forced to only occur once. (not really for review) https://mongodbcr.appspot.com/676640003/ Patch queue: https://spruce.mongodb.com/version/5f74d32f2a60ed66c518e2cf/tasks Already had to make a carve-out for setServiceEntryPoint, which is set twice by mongod_main.cpp, but I don't know why we do that yet. Lots of tests in the patch queue are triggering an invariant that set() should be called only once. The tests are calling setters for setOpObserver, setTransportLayer, setTickSource, etc... multiple times. Would be some work to fix all of it. |
| Comment by Billy Donahue [ 28/Sep/20 ] |
|
The atomicity of setters and getters added by commit 9ec023a45d is necessary but isn't enough to fix the ordering problem. We need further work to enforce invariants about ServiceContext setters occurring after getters. Reopening. |
| Comment by Billy Donahue [ 22/Sep/20 ] |
|
Should have fixed TSAN failures. This is a band-aid. All that's been added is synchronized access to these unique_ptrs, so that at least the getters return a coherent value, but we still need higher-level semantics to ensure the validity of any getter-returned raw pointers. A subsequent setter can still immediately invalidate that pointer. Without shared_ptr or another kind of shared locking, we won't be able to do better than that. Frankly the getters might need to return reference-holding objects like shared_ptr's to do it right. |
| Comment by Githook User [ 22/Sep/20 ] |
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: SERVER-51055 ServiceContext::SyncUnique to fix unique_ptr races.
|
| Comment by Billy Donahue [ 18/Sep/20 ] |