[SERVER-77959] Run commit handlers atomic for lock-free reads Created: 09/Jun/23  Updated: 06/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Henrik Edin Assignee: Josef Ahmad
Resolution: Unresolved Votes: 0
Labels: techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution EMEA
Sprint: Execution EMEA Team 2023-10-02, Execution EMEA Team 2023-10-16, Execution EMEA Team 2023-10-30, CAR Team 2023-11-13, CAR Team 2023-11-27, CAR Team 2023-12-11, CAR Team 2023-12-25, CAR Team 2024-01-08, CAR Team 2024-01-22, CAR Team 2024-02-05, CAR Team 2024-02-19
Participants:

 Description   

Commit handlers do not run atomically for lock-free reads which is a common source of bugs. When commit handlers were designed no readers could observe a partial state as locks were held.

An example of an issue is that adding idents to the drop pending reaper and dropping collections/indexes are currently not atomic operations as they execute in separate onCommit handlers and perform separate writes to the CollectionCatalog.

This leads to a dependency on the order that these onCommit handlers must execute: first adding the drop pending idents to the reaper and then making the drops visible for readers by publishing uncommitted catalog changes.

These kinds of dependencies are fragile and are currently implemented with hacky solutions with special ways to register the onCommit handlers so they execute in the required order. We should look into ways to clean this up, some ideas:

  1. Make these two writes atomic by performing them under the same CollectionCatalog write. We would need to figure out how to structure the code, should the CollectionCatalog be treated as a write-through cache or should we add a way to register special callbacks that execute as part of catalog visibility.
  2. Snoop the commitTime as it should have been registered on the RecoveryUnit after 


 Comments   
Comment by Josef Ahmad [ 25/Sep/23 ]

After discussing with Henrik, we agreed that we should audit the commit hooks for any other subtle ordering dependencies to ensure atomicity for operations that do not acquire collection locks, like lock-free reads.
 
This ticket's description refers to the ordering issue represented by SERVER-77895, which has since been resolved. Depending on the result of the audit, we might have to develop a more generalised method to define commit hook ordering dependencies.

Generated at Thu Feb 08 06:37:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.