[SERVER-85513] Handle ShardNotFound errors in the cleanup phase of the create coordinator Created: 22/Jan/24  Updated: 25/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 8.0 Required

Type: Task Priority: Major - P3
Reporter: Allison Easton Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: robust-create-collection
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Catalog and Routing
Participants:
Story Points: 2

 Description   

In the new createCollectionCoordinator, hitting a ShardNotFound error triggers some updates to the list of involved shardIds. This prevents us from continuously trying to contact or otherwise involve a shard the has been removed.

However, the _cleanupOnAbort function which is triggered by calling triggerCleanup does not have any onError handlers. This means that if an involved shard is removed while the create coordinator is running cleanup, the coordinator would continue to try to release the critical section on this removed shard.


Generated at Thu Feb 08 06:57:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.