[SERVER-55949] Relaxed load in CancellationState::isCanceled can allow for inconsistent state to be read Created: 08/Apr/21 Updated: 29/Oct/23 Resolved: 17/Jun/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.0.0-rc4, 5.1.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | George Wangensteen | Assignee: | Alex Li |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | post-rc0, servicearch-wfbf-day | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Backport Requested: |
v5.0
|
||||
| Sprint: | Service Arch 2021-05-17, Service Arch 2021-06-28 | ||||
| Participants: | |||||
| Story Points: | 1 | ||||
| Description |
|
Consider a cancelation hierarchy as follows:
Now imagine some thread, thread A, calls
Around the same time, another thread B reads the cancellation states:
Based on my understanding of memory ordering, the following could happen: When thread A cancels the first source, it marks the cancelation state for that source as "canceled" and then emplaces its cancelation promise. Because the second source is in a hierarchy with the first source, fulfilling this promise causes this continuation to run inline on thread A, which marks the cancelation state for the second source as "canceled."
However, because CancelationState::isCanceled uses loadRelaxed, it is possible that thread B reads second.isCanceled() == true and first.isCanceled == false; even though the stores are ordered first --> second on thread A, the relaxed load on thread B won't force any synchronization with the system. It is permissible in the memory model for thread B to read second.isCanceled without being synchronized with all the things that happened before the second source was cancelled in thread A. (The best explanation of this for me was the last example, "Mixing Memory Models" on this page: https://gcc.gnu.org/wiki/Atomic/GCCMM/AtomicSync) Tenant migrations and resharding code wants to rely on the ordering of cancelation sources in a hierarchy to be consistently visible to other threads; I think this goal makes sense and we should forbid reading this inconsistent state. Thanks to max.hirschhorn for finding this and working through the memory model examples with me.
Acceptance Criteria: isCanceled() uses load instead of loadRelaxed. |
| Comments |
| Comment by Vivian Ge (Inactive) [ 06/Oct/21 ] |
|
Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you! |
| Comment by Githook User [ 23/Jun/21 ] |
|
Author: {'name': 'Alex Li', 'email': 'alex.li@mongodb.com'}Message: (cherry picked from commit 7916b72cd1082330e05d5fd052e099f3387ca513) |
| Comment by Githook User [ 17/Jun/21 ] |
|
Author: {'name': 'Alex Li', 'email': 'alex.li@mongodb.com'}Message: |
| Comment by Billy Donahue [ 18/May/21 ] |
|
Would this problem also apply to the loadRelaxed() call in isCancelable() ? |
| Comment by Luis Osta (Inactive) [ 17/May/21 ] |