[SERVER-49296] Investigate why DNS numSlowDNSOperations spiked Created: 02/Jul/20  Updated: 09/Jul/20  Resolved: 09/Jul/20

Status: Closed
Project: Core Server
Component/s: Networking
Affects Version/s: 4.4.0-rc9
Fix Version/s: None

Type: Task Priority: Minor - P4
Reporter: Lamont Nelson Assignee: Lamont Nelson
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File timeline.png    
Issue Links:
Related
Participants:

 Description   

Two mongod processes crashed at point A in the attached graph, and were re-introduced into the replica set after being repaired at point B. During this period from A to B the numSlowDNSOperations metric spiked, and immediately plateaued at point B once the nodes were restored. The logs for the period A to B are not available, but we can simulate this scenario to see if this behavior is repeatable.

Since this metric is only counting the number of DNS operations past a threshold, it could be the case that the latency is roughly equal to the threshold level and the observed behavior is benign. It also could be the case that there was some actual network problem external to the server. Unfortunately, we'll never know for sure without logs. The fact that the slow DNS operations stopped exactly at point B makes this last scenario less likely.



 Comments   
Comment by Lamont Nelson [ 09/Jul/20 ]

The DNS metric turned out to be a red herring in  HELP-16677. Closing.

Generated at Thu Feb 08 05:19:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.