[SERVER-44522] serverStatus metrics for awaitable isMaster Created: 08/Nov/19  Updated: 08/Jan/24  Resolved: 21/Feb/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.3.4

Type: New Feature Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-44521 Implement exhaust isMaster for mongos Closed
Documented
is documented by DOCS-13438 Investigate changes in SERVER-44522: ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2020-02-10, Repl 2020-02-24
Participants:
Linked BF Score: 20

 Description   

Add to the "connections" document in serverStatus:

  • awaitingTopologyChanges: number of clients currently waiting in isMaster for a topology change
  • exhaustIsMaster: number of connections whose last request was an isMaster with exhaustAllowed


 Comments   
Comment by Githook User [ 21/Feb/20 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@mongodb.com'}

Message: SERVER-44522 serverStatus metrics for awaitable isMaster
Branch: master
https://github.com/mongodb/mongo/commit/db649073af2b6dac30b5d207ffa484f7d8cdc2b2

Comment by Githook User [ 21/Feb/20 ]

Author:

{'name': 'Ben Caimano', 'email': 'ben.caimano@10gen.com'}

Message: Revert "SERVER-44522 serverStatus metrics for awaitable isMaster"

This reverts commit b4915c29d4848439e23857c45fd3fcf94622b015.
Branch: master
https://github.com/mongodb/mongo/commit/8dd8e36940a4c44fa56ba4e8641001a1bc264560

Comment by Githook User [ 20/Feb/20 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@mongodb.com'}

Message: SERVER-44522 serverStatus metrics for awaitable isMaster
Branch: master
https://github.com/mongodb/mongo/commit/b4915c29d4848439e23857c45fd3fcf94622b015

Comment by Tess Avitabile (Inactive) [ 04/Feb/20 ]

Thank you for the advice! Your suggestion sounds good to me.

Comment by Mira Carey [ 04/Feb/20 ]

tess.avitabile,

Some thoughts on that design:

  • I'd prefer to avoid having _exhaustIsMaster as a direct member on transport::Session. It feels a bit like a layering violation to have the transport session directly track if the last command run was an exhaust isMaster.
    • An alternative would be to add the flag as a decoration on top of either the client object or the transport session (I don't have a strong perference here). Making the flag not just a bare boolean, bu instead an object that contained a boolean would give you an opportunity to hook in the dtor logic you need.
  • The other changes seem fine to me.

So basically, my pitch is to do:

struct InExhaustIsMaster {
  bool inExhaustIsMaster = false;
  ~InExhaustIsMaster() {
    if (inExhaustIsMaster) {
      decrementGlobal();
    }
  }
  ... // maybe some getters/setters to change the global as we manipulate the flag
};
 
// in the same .cpp as CmdIsMaster
const auto getExhaustIsMaster = transport::Session::declareDecoration<InExhaustIsMaster>();
 
bool runWithReplyBuilder(...) {
...
  auto& inExhaustIsMaster = getExhaustIsMaster(opCtx->getClient()->getSession().get());
  ... // do whatever we need to
}

Comment by Tess Avitabile (Inactive) [ 03/Feb/20 ]

The implementation of the exhaustIsMaster metric is in an unfamiliar area of the code for me, so I'd like to post a design here and get feedback before I start typing. mira.carey@mongodb.com, would you be willing to provide feedback on this design?

The exhaustIsMaster metric is defined as the number of connections whose last request was an isMaster with exhaustAllowed. I plan to add a class IsMasterStats that contains an AtomicWord<size_t> _exhaustIsMasterConnections (it will also track metrics for awaitingTopologyChanges, but I won't design that here). I will add a new field _exhaustIsMaster to transport::Session. Whenever this field is toggled (including on destruction), I will increment/decrement IsMasterStats::_exhaustIsMasterConnections (alternatively, instead of adding logic to the transport::Session destructor, I can add logic to the ServiceStateMachine cleanup hook). I will also add a field exhaustIsMaster to DbResponse, which will be set in CmdIsMaster::runWithReplyBuilder() on mongod and mongos. ServiceStateMachine::_processMessage will copy the field DbResponse::exhaustIsMaster to transport::Session::_exhaustIsMaster.

Comment by Tess Avitabile (Inactive) [ 29/Jan/20 ]

I agree that the two metrics as currently designed will help in diagnosing those issues. Thank you for weighing in.

Comment by Bruce Lucas (Inactive) [ 29/Jan/20 ]

tess.avitabile, I don't have any further insights and I'm happy to defer to Jesse's judgement.

Comment by A. Jesse Jiryu Davis [ 29/Jan/20 ]

Thinking again, perhaps "number of clients awaiting topology changes without exhaust" is not an important metric, so the sentence "To count the number of clients awaiting topology changes without exhaust, subtract the second metric from the first" is not just inaccurate but also irrelevant.

The purpose I imagine for these metrics is to help diagnose any issues that arise from the new protocol. E.g., if we saw exhaustIsMaster growing ever larger we'd suspect the server wasn't detecting client disconnects, or if we saw CPU usage rise in tandem with awaitingTopologyChanges we'd suspect a bug in ReplicationCoordinatorImpl::awaitIsMasterResponse.

So I propose we keep these two metrics, awaitingTopologyChanges and exhaustIsMaster, but we don't care about which clients are counted by both or neither at any moment.

Comment by Tess Avitabile (Inactive) [ 28/Jan/20 ]

I'm concerned that with the way the metrics are specified, exhaustIsMaster will not always be smaller than awaitingTopologyChanges, so subtracting them will not make sense. This is because a connection that is being used for exhaust isMaster request will not always be in the state of waiting for topology changes, for example when it is building a response. We could ensure exhaustIsMaster is at most awaitingTopologyChanges by defining it as "number of clients currently waiting in isMaster (with exhaust) for a topology change". However, the downside would be that these connections blip in and out of this metric, which may be undesirable.

jesse, did I understand the intention of these metrics? Do you or bruce.lucas have any guidance on them?

Comment by Bruce Lucas (Inactive) [ 23/Jan/20 ]

Can you summarize on this ticket the proposed metrics?

Comment by A. Jesse Jiryu Davis [ 08/Nov/19 ]

Once mongod and mongos have exhaust awaitable isMaster, implement metrics to count the clients that use it.

Generated at Thu Feb 08 05:06:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.