[SERVER-60063] Log server discovery times Created: 17/Sep/21  Updated: 29/Oct/23  Resolved: 25/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0

Type: Improvement Priority: Major - P3
Reporter: Kevin Arhelger Assignee: Rachita Dhawan
Resolution: Fixed Votes: 1
Labels: sharding-nyc-subteam2
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-58448 Improve HostUnreachable error respons... Backlog
Backwards Compatibility: Minor Change
Sprint: Sharding 2022-01-24, Sharding 2022-02-07
Participants:
Story Points: 2

 Description   

For both mongos and mongod, it would be helpful to have an indicator of how long server discovery is taking.

For example, if a replica set has an election, understanding the time it took a mongos to discover the new primary is very useful to diagnose issues, by logging "It took X milliseconds to find a valid primary" This should measure the time from when a active primary is not detected until a new primary is discovered.

Ideally, a single log line reports

  • the total time, as .attr.durationMillis, that any server needed to determine the state of another server
  • the desired state that needs to be determined or the reason the state of the other server needed to be known

Acceptance criteria:

server description record in topology change message includes the amount of time spent in previous topology state. 



 Comments   
Comment by Githook User [ 25/Jan/22 ]

Author:

{'name': 'Rachita Dhawan', 'email': 'rachita.dhawan@gmail.com', 'username': 'racdhawan'}

Message: SERVER-60063 Add logging for Server Discovery time
Branch: master
https://github.com/mongodb/mongo/commit/6f33c7f4427437ab4ae7e1fd077afce984ed6be8

Comment by Rachita Dhawan [ 16/Jan/22 ]

https://github.com/10gen/mongo/pull/2750

Comment by Eric Sedor [ 01/Oct/21 ]

Triage discussed this and I spoke with Kevin offline. We're in agreement that at a minimum, it will be very helpful to have log lines with .attr.durationMillis that describe gaps in any understanding of another server.

I've edited the description to this end. kevin.arhelgerbruce.lucas can you let me know if I should amend the description further?

Comment by Bruce Lucas (Inactive) [ 21/Sep/21 ]

Would this be better as a log line rather than a serverStatus metric? Generally serverStatus is best for things that happen frequently or repeatedly and can be captured by cumulative counters, but these seem like (mostly) one-time events. Also a log line for such events should contain a durationMillis attr, which will cause it to show up prominently on charts that show such durations which may make it easy to spot correlations with other issues.

Generated at Thu Feb 08 05:48:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.