[SERVER-61717] Ensure a POS instance remains in the POS map until the instance's run() is complete Created: 23/Nov/21 Updated: 21/Sep/23 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 1 |
| Labels: | ordered | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Service Arch 2022-07-11, Service Arch 2022-11-28, Service Arch 2023-01-09, Service Arch 2023-01-23, Service Arch 2023-02-06, Service Arch 2023-02-20, Service Arch 2023-03-06, Service Arch 2023-03-20, Service Arch 2023-04-03, Service Arch 2023-04-17, Service Arch 2023-05-01, Service Arch 2023-05-15, Service Arch 2023-05-29, Service Arch 2023-06-12, Service Arch 2023-06-26 | ||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
The original PrimaryOnlyService left it up to individual instances to insert and delete their state documents, but has an op observer for removing an instance from the in-memory map when the state document is deleted. This means an instance can end up in a "detached" state where it can continue executing logic after deleting its state doc, but no longer be in the in-memory map. This, in combination to the way services are interrupted on stepdown and shutdown, has led to oddities like making commands that wait on an instance's completion future call OperationContext::setAlwaysInterruptAtStepDownOrUp. (If they didn't, the state doc could be deleted while the node is primary, then the instance gets removed from the in-memory map, then the node steps down and the instance doesn't get interrupted since it's no longer in the map but its further work does get canceled since the scoped executor gets shut down, so the instance's completion promise is never fulfilled and the command would end up waiting forever.) This ticket is to instead remove the instance from the in-memory map only after the instance's run() has complete and remove the calls to OperationContext::setAlwaysInterruptAtStepDownOrUp. |
| Comments |
| Comment by Jason Chan [ 30/Jun/23 ] |
|
Setting this back to Open and depending on |
| Comment by Jason Chan [ 27/Sep/22 ] |
|
Putting back into Triage now that all dependencies are resolved. |
| Comment by Matt Broadstone [ 27/Sep/22 ] |
|
This change is further motivated by the use case of ensuring instances are removed from memory if they complete before writing a state document. If an instance completes before the document is written (say, for some validation or conflict resolution), then we cannot depend on the opObserver to remove the instance, so the instance will remain in memory until a new primary steps up or the server shuts down. Ensuring instances are removed after their completion future becomes ready solves this problem. I've linked |
| Comment by Esha Maharishi (Inactive) [ 13/Apr/22 ] |
|
Marking as related to |
| Comment by Esha Maharishi (Inactive) [ 04/Apr/22 ] |
|
Created
|
| Comment by Esha Maharishi (Inactive) [ 29/Nov/21 ] |
|
One complication is that the tenant migration donor and recipient use a TTL index to delete their state docs (the instance's run() just sets the expireAt on the instance's state doc). However, an instance should remain in the in-memory map until its state doc is removed, so that getOrCreateInstance() can just check the in-memory map for a matching instance. It might be simpler if we don't use a TTL index, and instead the instance's run() includes waiting for the state doc to expire and deleting the state doc. |