[SERVER-78108] POS interface should expose its shutdown state Created: 14/Jun/23  Updated: 09/Nov/23  Resolved: 10/Jul/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.3, 6.0.12, 5.0.23

Type: Task Priority: Major - P3
Reporter: Abdul Qadeer Assignee: Wenbin Zhu
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-78009 shardSvrCommitReshardCollection comma... Closed
Related
related to SERVER-78839 Investigate changes needed in shardin... Backlog
related to SERVER-78840 Investigate changes needed in serverl... Backlog
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0, v6.0, v5.0
Sprint: Service Arch 2023-07-10
Participants:

 Description   

The Primary Only Service currently lacks a static method that allows derived classes to determine whether the service has been shut down. The existing lookupInstance() method does not distinguish between an instance not being present in the registry and the internal state being State::kShutdown.

The need for this functionality arose from issue BF-29013. To generalize the requirement, if a command is interrupted due to a shutdown, a retry may end up on the same node if the primary is unable to step down quickly due to reasons like lagging secondary nodes. Since the shutdown operation is not atomic and external observers may see different system states, the command needs visibility of POS state to perform the right action which can be to retry the command due to POS Shutdown until a new primary assumes its role to communicate the same to the sender of the command. The state if exposed by throwing a Shutdown error would be nice in a lookup() method, otherwise a function to check if service is shutdown will suffice albeit at the cost of some boiler plate code at callers.



 Comments   
Comment by Githook User [ 09/Nov/23 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-78108 Expose primary state when doing POS instance lookup.

(cherry picked from commit c3496898db5124006252b129f2c9a5461e1737ac)
(cherry picked from commit fef5c0d8c8312b01bb2ef061455bca31d1c4d7c6)
Branch: v5.0
https://github.com/mongodb/mongo/commit/903e61a3264714275c9bbd5d95cfbd1dfcd36e5b

Comment by Githook User [ 07/Nov/23 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-78108 Expose primary state when doing POS instance lookup.

(cherry picked from commit c3496898db5124006252b129f2c9a5461e1737ac)
Branch: v6.0
https://github.com/mongodb/mongo/commit/fef5c0d8c8312b01bb2ef061455bca31d1c4d7c6

Comment by Githook User [ 04/Oct/23 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-78108 Expose primary state when doing POS instance lookup.

(cherry picked from commit c3496898db5124006252b129f2c9a5461e1737ac)
Branch: v7.0
https://github.com/mongodb/mongo/commit/0ba4e8ffb2f804ff650320f20ded544605544ec5

Comment by Githook User [ 10/Jul/23 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-78108 Expose primary state when doing POS instance lookup.
Branch: master
https://github.com/mongodb/mongo/commit/c3496898db5124006252b129f2c9a5461e1737ac

Comment by Wenbin Zhu [ 28/Jun/23 ]

Instead of letting lookup() and lookupInstance() throw exception if the service is shutdown or stepped down, we are going to augment the interface of lookup() and lookupInstance() to return an additional flag indicating if the service is shutdown/stepdown or the instanceID is invalid. When the application cannot get a POS instance when calling either method, it can choose to check this flag to determine if different actions needs to be done. We will file new tickets for teams that own such applications to investigate if they need to check the new flag. cc abdul.qadeer@mongodb.com 

Comment by Wenbin Zhu [ 28/Jun/23 ]

an instance not being present in the registry

To clarify, this means that the instanceID that the caller passes into lookupInstance() does not match any instanceID that service is aware of. The set of instanceID's that the service knows about are the _id values of the state documents. Also note that lookupInstance() waits for service state to be not kRebuilding before doing the instance lookup.

Generated at Thu Feb 08 06:37:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.