[SERVER-53197] Awaitable hello/isMaster appears in slow query log, part 2 Created: 02/Dec/20  Updated: 29/Oct/23  Resolved: 08/Dec/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.4.0
Fix Version/s: 4.9.0, 4.4.3

Type: Bug Priority: Minor - P4
Reporter: A. Jesse Jiryu Davis Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
Related
related to SERVER-47267 Exhaust isMaster replies are logged a... Closed
related to SERVER-50242 slow query message seen in mongos for... Closed
is related to SERVER-48393 Exclude exhaust isMaster commands fro... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Participants:

 Description   

I didn't really fix SERVER-47267 and SERVER-50242 (awaitable "hello" / "ismaster" is logged as a slow query in mongos / mongod). Since the default duration for an awaitable ismaster is 10 seconds, I had thought it was appropriate to log if the command takes 20+ seconds, since that would indicate unusual slowness.

However, users can and do override this default by setting their drivers' heartbeatFrequencyMS to a longer duration; if this is 20 seconds or more then ismasters make noise in the slow query log again.



 Comments   
Comment by Jack Weir [ 14/Jan/21 ]

The "Slowest Operations" panel in the Real Time Performance Panel is populated by the output of running currentOp, not from the slow query logs. The noisy isMaster's in the currentOp output is a known issue being addressed on the server side by SERVER-51358

Comment by Jussi Kauppila [ 13/Jan/21 ]

We are still seeing this in Mongo Atlas which runs version 4.4.3 now.

 

Here one from cut&pasted from our Atlas realtime view:

 

 
connectionId: 617163
secs_running: 9
op: command
waitingForLatch: Object
timestamp: Object
$date: 2021-01-13T08:41:17.562Z
captureName: FutureResolution
type: op
appName: –
waitingForLock: false
lockStats: Object
desc: conn617163
clientMetadata: Object
os: Object
type: Linux
 
driver: Object
version: 3.11.2
name: PyMongo
opid: 356588731
microsecs_running: 9977371
ns: admin.$cmd
numYields: 0
host: ----
client: —
active: true
currentOpTime: 2021-01-13T08:41:27.439+00:00
locks: Object
waitingForFlowControl: false
flowControlStats: Object
query: Object
ismaster: 1
topologyVersion: Object
maxAwaitTimeMS: 10000
$clusterTime: Object
$db: admin
$readPreference: Object
 

Comment by Githook User [ 12/Dec/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-53197 Don't log awaitable hello/ismaster as slow query
Branch: v4.4
https://github.com/mongodb/mongo/commit/5b9cc730aba346d1bbc88bbd17e432c6641a9d6f

Comment by Bruce Lucas (Inactive) [ 10/Dec/20 ]

Yes, I think that's fine, thanks.

Comment by A. Jesse Jiryu Davis [ 09/Dec/20 ]

bruce.lucas another question - we intend to backport this fix to the next 4.4 release. The main point is to silence noisy logs, but we will also make the latency stats change. Does that sound okay to you?

Comment by Githook User [ 08/Dec/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-53197 Don't log awaitable hello/ismaster as slow query
Branch: master
https://github.com/mongodb/mongo/commit/af18d77f83f12b0b5762cc0b4fe2059ab1d37e2c

Comment by Bruce Lucas (Inactive) [ 08/Dec/20 ]

jesse the consensus in our meeting is that your proposed change looks good. Thanks for asking.

Comment by A. Jesse Jiryu Davis [ 07/Dec/20 ]

Bruce, that's all correct. I'm curious what the perf team thinks. Depending on the answer, I may make the proposed change (include awaitable isMaster minus sleep time in latency metrics) or just ban awaitable isMasters from latency metrics entirely.

Comment by Bruce Lucas (Inactive) [ 07/Dec/20 ]

The frequency of the awaitable isMaster commands will generally be fairly low, once every 10 seconds per connection, correct? So I think the effect on the latency metrics between 4.4 and 5.0 will be small, maybe unless there are a very large number of connections.

On the other hand, I'm not sure how much value there is in including the awaitable isMaster (non-sleep) time in the latency metrics, and any change between versions, even if small, will make it more difficult to compare performance between versions. However maybe the latency of commands (which are reported separately from reads and writes) is not so important in this regard.

To summarize:

  • pro - more consistent with the notion that command latency metrics include processing time for all commands
  • pro - effect on latency metrics should be small
  • con - any change in latency metrics makes it difficult to compare between versions
  • con - possibly low value to including isMaster in latency metrics
  • pro - perf comparisons between versions will probably be more interested in read and write metrics, not command metrics

Assuming that's correct, I'll run this by our perf team.

Comment by A. Jesse Jiryu Davis [ 04/Dec/20 ]

bruce.lucas, as part of the fix for this ticket, I'm considering adding awaitable isMaster back to latency metrics, changing our approach from SERVER-48393. So the server development history will be:

  • Pre-4.4: No awaitable isMaster, isMaster is always included in latency metrics
  • 4.4 rc: Awaitable isMaster is introduced and included in latency metrics
  • 4.4.0: Awaitable isMaster is excluded from latency metrics (old-fashioned isMaster is still included)
  • 5.0: Awaitable isMaster is included in latency metrics, but the sleep time is subtracted

I expect we'd see latency metrics fall between 4.4 and 5.0 if we make this change, since awaitable isMaster minus its sleep time is very fast, and it will now be included in the overall metrics.

Do you think it's better to make this change (include awaitable isMaster minus sleep time in latency metrics), or better to preserve the status quo (awaitable isMaster is excluded from latency metrics)?

Comment by Max Hirschhorn [ 02/Dec/20 ]

jesse, have we considered using CurOp::pauseTimer() like we do for getMore on tailable+awaitData cursors to not track the time spent waiting rather than setting a slowMsOverride?

Generated at Thu Feb 08 05:30:13 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.