The atomic operations in WiredTiger should ensure that a full memory barrier is provided by them.
The lock-free algorithms in use by WiredTiger rely on the instructions not getting re-ordered across the atomics. On GCC WiredTiger uses the __atomic builtins to implement its atomic operations. The __atomic builtins provided by the compiler could potentially not be utilizing a full memory barrier instruction on some platforms with a relaxed memory model like AArch64 (ARM64). This could leave WiredTiger exposed to a data corruption possibility on such platforms. WiredTiger should make sure the underlying primitives it uses include a full memory sync each time they are used.
A thread on GCC discussing why __atomic builtins might not be strong enough for the older __sync builtins.
Another thread discussing how AArch64 atomics might allow subtle re-orders.
- Does this affect any team outside of WT?
- How likely is it that this use case or problem will occur?
Arguably possible on x86-64. Very rare, but possible on AArch64 (ARM64).
- If the problem does occur, what are the consequences and how severe are they?
On the platforms like ARM64 that have a relaxed memory model for the SMP architecture, it could result in subtle data corruption bugs.
- Is this issue urgent?
Acceptance Criteria (Definition of Done)
All the atomic operations have been guaranteed to provide a full memory barrier on the various platforms WiredTiger supports. The performance impact has been evaluated and found to be acceptable. WiredTiger stress testing and MongoDB patch testing has been successfully completed.
WiredTiger stress tests, MongoDB patch tests, WiredTiger and MongoDB performance tests.
Change the documentation on building on the POSIX systems to ensure the build provides a full barrier that WiredTiger needs.
The scope for this ticket has been limited to an investigation into the atomic operations on the ARM platform. New tickets will be created for any follow on work.
[Optional] Suggested Solution
A possible way is to use __sync builtins instead of __atomic builtins as they are meant to provide a full barrier. The modern __sync builtins are implemented using the __atomic builtins, so we have to be careful about doing so. Newer versions of GCC should handle this correctly since a change went into GCC 5.0 to provide __atomic builtins with a full barrier to support the __sync builtins.
Another possibility could be to directly use the MEMMODEL_SYNC with __atomic builtins that was introduced to fix the __sync builtins.
We will have to investigate the facilities provided by the MSVC compiler.