[SERVER-74385] Investigate if it is safe for DiskSpaceMonitor to abort builds during startup recovery or rollback Created: 27/Feb/23  Updated: 20/Mar/23  Resolved: 20/Mar/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Yujin Kang Park Assignee: Josef Ahmad
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Storage Execution
Sprint: Execution Team 2023-04-03
Participants:

 Description   

We should determine if it is safe first, and then if it is worth the effort. Can we get into a state where nodes are crashing or blocked from transitioning into steady state replication?

If we end up enabling this, make sure to add tests.



 Comments   
Comment by Josef Ahmad [ 20/Mar/23 ]

Closing as Won't Do, as the current behaviour looks fine.

  • Startup recovery (from unclean shutdown) restarts an incomplete index build.
    • The DiskSpaceMonitor starts after startup recovery completes. At that point, it is able to abort the in-progress build.
    • A corner case scenario probably exists where the server crashed because it ran out of disk space, an index build contributing to it. On restart, the DiskSpaceMonitor must race again to cancel that index build. Even if it eventually managed to do it (potentially after multiple restarts) at that point, the index build may have already committed in the replica set. In summary, it would be the administrator's responsibility to increase disk size at that point.
  • Replication rollback kills the index build. Quoting here: "Called during rollback to stop all active index builds. [...] no abortIndexBuild is replicated and the current node will restart these builds at the completion of rollback. [...]". At that point, the DiskSpaceMonitor comes into play.
Generated at Thu Feb 08 06:27:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.