[SERVER-46446] Log time spent waiting for remote operations Created: 27/Feb/20  Updated: 29/Oct/23  Resolved: 20/Jan/23

Status: Closed
Project: Core Server
Component/s: Diagnostics, Querying
Affects Version/s: None
Fix Version/s: 6.3.0-rc0, 6.0.8

Type: Improvement Priority: Major - P3
Reporter: Josef Ahmad Assignee: Yoon Soo Kim
Resolution: Fixed Votes: 0
Labels: qexec-team, query-offsite
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-15832 Investigate changes in SERVER-46446: ... Backlog
Related
related to SERVER-73120 Remove recordRemoteOpWaitTime when we... Open
is related to SERVER-31368 Log time spent waiting for other shar... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.0, v4.4
Sprint: Query 2020-03-23
Participants:
Case:

 Description   

This ticket is the find() counterpart of SERVER-31368 which addresses aggregation.



 Comments   
Comment by Githook User [ 20/Jun/23 ]

Author:

{'name': 'Yoonsoo Kim', 'email': 'yoonsoo.kim@mongodb.com', 'username': 'yun-soo'}

Message: SERVER-46446 Log time spent waiting for remote operations

(cherry picked from commit be3f6ec23c5cbc4e4b7e563d50551fb7e2c7340b)
Branch: v6.0
https://github.com/mongodb/mongo/commit/7d1f82c39cc2c0d7909cdd47c945e51cdaf348a4

Comment by Githook User [ 20/Jan/23 ]

Author:

{'name': 'Yoonsoo Kim', 'email': 'yoonsoo.kim@mongodb.com', 'username': 'yun-soo'}

Message: SERVER-46446 Log time spent waiting for remote operations
Branch: master
https://github.com/mongodb/mongo/commit/be3f6ec23c5cbc4e4b7e563d50551fb7e2c7340b

Comment by Bernard Gorman [ 13/Jan/23 ]

yoonsoo.kim@mongodb.com: yes, I think that's a reasonable approach. I don't think it will be very diagnostically useful for most non-cursor commands, since they typically involve little or no work on mongoS and any excessive execution time therefore implies a problem on the shards. But there may certainly be cases where it's helpful, and at the very least this field would allow engineers to confirm at a glance that the remote op is the source of the issue. Fortunately, we decided to give the field the extremely generic name remoteOpWaitMillis in the original patch, so that we could accommodate this approach later

Comment by Yoon Soo Kim [ 12/Jan/23 ]

Hi bernard.gorman@mongodb.com, justin.seyster@mongodb.com, Sorry for pinging you about an old issue. I'm looking at this ticket and related code and the previous CR.

I think it would be simpler to log a generic remoteOpTimeWaitMillis for any command which contacts a remote shard using AsyncRequestsSender (Bernard's alternative solution). My rationales are

  1. It would be simple compared to plumbing a flag to record the remote wait time or not and
  2. if remoteOpTimeWaitMillis would be negligible for other commands except find if we're right about other commands' remote wait time. If not, we can find another cases where the remote wait time matters

LMK what you think. If we can agree on a solution, I'll implement it soon.

Comment by Bernard Gorman [ 27/Mar/20 ]

After some investigation, and consistent with the discussion on the original CR here, there are some complications in doing this for find which means that we can't get this for free as a result of SERVER-31368. In particular, find may retrieve a batch during the initial command, whereas aggregate does no work beyond establishing the cursors until the first getMore; this means that we will need to add instrumentation to the AsyncRequestsSender as well as the BlockingResultsMerger. Additionally, since many non-cursor commands use the ARS , we will need to plumb a parameter down to the ARS to indicate whether or not it should record the remote wait time. Alternatively, we could simply log a generic remoteOpTimeWaitMillis field for any command which contacts a remote shard.

Moving this to Q2 Quick Wins for further consideration.

Generated at Thu Feb 08 05:11:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.