[SERVER-75059] Explore improvements on index build concurrent state handling Created: 20/Mar/23  Updated: 24/May/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Yujin Kang Park Assignee: Backlog - Storage Execution Team
Resolution: Unresolved Votes: 1
Labels: techdebt
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-74953 Explore avoiding stepdowns during the... Closed
is related to SERVER-75585 Explore simplifying the index build s... Backlog
Assigned Teams:
Storage Execution
Participants:

 Description   

The way index build state is handled is very racy.

The way active index builds are managed. Active index builds are registered before the 'startIndexBuild' oplog entry is replicated, which leaves open the possibility of a node stepping down, and encountering oplog entry for a DDL operation which assumes no index builds are in-progress (e.g. a collection drop) while the index build is still registered. SERVER-74953 tries to mitigate the issue by making the builder thread interruptible until the oplog entry is replicated, but it won't make the problem disappear.

The reverse is also true, a committed index build de-registered and cleaned-up can race with step-up checks. And the way index builds are de-registered is all over the place. A search for "activeIndexBuilds.unregisterIndexBuild" turns up 10+ instances.

...


Generated at Thu Feb 08 06:29:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.