[SERVER-78108] POS interface should expose its shutdown state Created: 14/Jun/23 Updated: 09/Nov/23 Resolved: 10/Jul/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.1.0-rc0, 7.0.3, 6.0.12, 5.0.23 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Abdul Qadeer | Assignee: | Wenbin Zhu |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v7.0, v6.0, v5.0
|
||||||||||||||||||||||||
| Sprint: | Service Arch 2023-07-10 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
The Primary Only Service currently lacks a static method that allows derived classes to determine whether the service has been shut down. The existing lookupInstance() method does not distinguish between an instance not being present in the registry and the internal state being State::kShutdown. The need for this functionality arose from issue BF-29013. To generalize the requirement, if a command is interrupted due to a shutdown, a retry may end up on the same node if the primary is unable to step down quickly due to reasons like lagging secondary nodes. Since the shutdown operation is not atomic and external observers may see different system states, the command needs visibility of POS state to perform the right action which can be to retry the command due to POS Shutdown until a new primary assumes its role to communicate the same to the sender of the command. The state if exposed by throwing a Shutdown error would be nice in a lookup() method, otherwise a function to check if service is shutdown will suffice albeit at the cost of some boiler plate code at callers. |
| Comments |
| Comment by Githook User [ 09/Nov/23 ] |
|
Author: {'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}Message: (cherry picked from commit c3496898db5124006252b129f2c9a5461e1737ac) |
| Comment by Githook User [ 07/Nov/23 ] |
|
Author: {'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}Message: (cherry picked from commit c3496898db5124006252b129f2c9a5461e1737ac) |
| Comment by Githook User [ 04/Oct/23 ] |
|
Author: {'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}Message: (cherry picked from commit c3496898db5124006252b129f2c9a5461e1737ac) |
| Comment by Githook User [ 10/Jul/23 ] |
|
Author: {'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}Message: |
| Comment by Wenbin Zhu [ 28/Jun/23 ] |
|
Instead of letting lookup() and lookupInstance() throw exception if the service is shutdown or stepped down, we are going to augment the interface of lookup() and lookupInstance() to return an additional flag indicating if the service is shutdown/stepdown or the instanceID is invalid. When the application cannot get a POS instance when calling either method, it can choose to check this flag to determine if different actions needs to be done. We will file new tickets for teams that own such applications to investigate if they need to check the new flag. cc abdul.qadeer@mongodb.com |
| Comment by Wenbin Zhu [ 28/Jun/23 ] |
To clarify, this means that the instanceID that the caller passes into lookupInstance() does not match any instanceID that service is aware of. The set of instanceID's that the service knows about are the _id values of the state documents. Also note that lookupInstance() waits for service state to be not kRebuilding before doing the instance lookup. |