Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Critical - P2
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- arm
- refinement
- stability

Total Hours with Assigned Team:
38,437.992
Sprint:
Storage - Ra 2022-03-21
Story Points:
None

Backport Requested:

v6.0

Summary
The atomic operations in WiredTiger should ensure that a full memory barrier is provided by them.

Motivation
The lock-free algorithms in use by WiredTiger rely on the instructions not getting re-ordered across the atomics. On GCC WiredTiger uses the __atomic builtins to implement its atomic operations. The __atomic builtins provided by the compiler could potentially not be utilizing a full memory barrier instruction on some platforms with a relaxed memory model like AArch64 (ARM64). This could leave WiredTiger exposed to a data corruption possibility on such platforms. WiredTiger should make sure the underlying primitives it uses include a full memory sync each time they are used.

For reference:
A thread on GCC discussing why __atomic builtins might not be strong enough for the older __sync builtins.

Another thread discussing how AArch64 atomics might allow subtle re-orders.

Does this affect any team outside of WT?
No

How likely is it that this use case or problem will occur?
Arguably possible on x86-64. Very rare, but possible on AArch64 (ARM64).

If the problem does occur, what are the consequences and how severe are they?
On the platforms like ARM64 that have a relaxed memory model for the SMP architecture, it could result in subtle data corruption bugs.

Is this issue urgent?
Yes

Acceptance Criteria (Definition of Done)
All the atomic operations have been guaranteed to provide a full memory barrier on the various platforms WiredTiger supports. The performance impact has been evaluated and found to be acceptable. WiredTiger stress testing and MongoDB patch testing has been successfully completed.

~~Testing~~
~~WiredTiger stress tests, MongoDB patch tests, WiredTiger and MongoDB performance tests.~~

~~Documentation update~~
~~Change the documentation on building on the POSIX systems to ensure the build provides a full barrier that WiredTiger needs.~~

Updated Scope
The scope for this ticket has been limited to an investigation into the atomic operations on the ARM platform. New tickets will be created for any follow on work.

[Optional] Suggested Solution
A possible way is to use __sync builtins instead of __atomic builtins as they are meant to provide a full barrier. The modern __sync builtins are implemented using the __atomic builtins, so we have to be careful about doing so. Newer versions of GCC should handle this correctly since a change went into GCC 5.0 to provide __atomic builtins with a full barrier to support the __sync builtins.

Another possibility could be to directly use the MEMMODEL_SYNC with __atomic builtins that was introduced to fix the __sync builtins.

We will have to investigate the facilities provided by the MSVC compiler.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screen Shot 2022-03-25 at 5.34.26 pm.png
17 kB
Mar 25 2022 06:34:33 AM UTC
skip_list_race.png
1.23 MB
Mar 30 2022 05:17:45 AM UTC

depends on

WT-8946 Use the maximum number of cores for ARM64 stress testing and disable ASan tests on this build variant

Closed

WT-9079 All memory barriers on ARM should be DMB instead of DSB

Closed

WT-9081 Reintroduce PPC tests

Closed

is related to

WT-9256 Investigate changes to WiredTiger for non-LSE ARM platforms

Closed

WT-6798 Utilize Arm LSE atomics and the correct strength barriers

Closed

WT-9124 Evaluate increasing stress testing coverage for PPC

Backlog

WT-2786 Migrate WiredTiger to the C11 memory model

Closed

WT-9080 Increase the amount of stress testing on ARM platform

Closed

(3 is related to)

Assignee:: Sulabh Mahajan
Reporter:: Sulabh Mahajan
Votes:: 0 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Mar 16 2022 01:17:04 AM UTC
Updated:: May 30 2022 01:56:23 AM UTC
Resolved:: Apr 29 2022 12:47:02 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates