-
Type:
Improvement
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
N&O Prioritized List
-
(copied to CRM)
-
None
-
3
-
None
-
None
-
None
-
None
-
None
-
None
A recent escalation involves mongod replicas that are lagging behind the primary, and this is causing an increase in latency for certain client commands.
We identified an improvement to the server observability that would aid the investigation.
FTDC metric for how often (i.e. when) DBClientSession::_call attempts a reconnect in ensureConnection()
FTDC metric for how often (i.e. when) DBClientSession::_call is not connected and throws in ensureConnection()
i.e. add metrics that add visibility to dbclient_session.cpp and likely adjacent code.
The idea is: the mongod replica is using DBClientSession to issue getMore commands to the primary, and maybe something is wrong with the connection and so the replica tries to reconnect. Right now we would only see this at debug level logging. Better would be to have the two FTDC metrics above.
- related to
-
SERVER-104674 Improve N&O playbook for suspected oplog networking issues
-
- Open
-