[SERVER-74265] Create class for concurrent relaxed readers Created: 22/Feb/23  Updated: 16/Mar/23  Resolved: 16/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Jordi Olivares Provencio Assignee: Backlog - Service Architecture
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-74931 Add AtomicWord::storeRelaxed Closed
Assigned Teams:
Service Arch
Participants:

 Description   

Some TSAN failures are not an actual issue as the reader already can accept relaxed memory semantics and there's only a single separate writer thread.

This is the case most notably with all of our metrics. Most of the time the solution performed is to wrap the value in an AtomicWord and perform the operations using that wrapper. However, the writes are performed with strong ordering guarantees that have a slight overhead. These aren't really necessary as there's only one writer thread that is the same throughout the metric's lifetime and the readers can accept relaxed memory semantics.

SERVER-72481 for example found that using AtomicWord semantics for the metrics contributed to a 0.8% regression. The same ticket asks for creating a better version for metrics that generates the same assembly code but explicitly tells TSAN that it's fine.

To avoid the boilerplate process of solving these TSAN failures with AtomicWord we could try to have a class that acts as if it's the normal variable yet offers relaxed atomic semantics. Whenever we encounter a TSAN failure that we can match with this pattern we would simply wrap the type with this class and be done with the change.



 Comments   
Comment by Billy Donahue [ 16/Mar/23 ]

We're gonna close this in favor of SERVER-74931, which is to add storeRelaxed to AtomicWord.

The concurrent relaxed readers pattern being captured in a utility class is perhaps still valuable.
But we can start with the storeRelaxed and build from that.

Comment by Billy Donahue [ 23/Feb/23 ]

It should be ok to add to AtomicWord.
it should not be more of a "footgun" than loadRelaxed.

Changing to a facade type as proposed could be inconvenient for interoperability with AtomicWord, so I'm not in favor of that.

All we need is one more kind of operation on AtomicWord and adding it is super easy. Duplicating the AtomicWord API in another class is much harder to make, maintain over time, teach or discover.

If the storeRelaxed should be used with caution, we can be sure document it. I assume we would write appropriate docs regardless.

Comment by Jordi Olivares Provencio [ 23/Feb/23 ]

Yes, but I wouldn't add that to AtomicWord since it's a footgun ripe for misuse. My idea was to create a look-alike AtomicWord class that implements all the operators in terms of AtomicWord so it acts as a drop-in replacement without having to make everything use AtomicWord methods to access the data.

Comment by Billy Donahue [ 23/Feb/23 ]

it sounds like you might just need a AtomicWord::storeRelaxed operation.

Comment by Jordi Olivares Provencio [ 22/Feb/23 ]

Sure:

  • SERVER-74263 is one where the AdmissionPriority is only read by curOp
  • SERVER-48687 notably considers the data races as benign so the extra mutex is unnecessary
  • SERVER-50240 also deals with curOp
  • SERVER-73591 fixed the problem but sadly affected query execution even if it wasn't susceptible to it
  • SERVER-74088 also mentions that the race is benign and doesn't cause any issues
  • SERVER-70785 for a more recent case where this is considered also benign and resulted in adding a mutex that is potentially unnecessary.
Comment by Daniel Gomez Ferro [ 22/Feb/23 ]

jordi.olivares-provencio@mongodb.com could you link to some examples where this happens (other than LockStats)?

Generated at Thu Feb 08 06:26:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.