[SERVER-54144] Relax timing constraint in profile_operation_metrics.js Created: 29/Jan/21  Updated: 29/Oct/23  Resolved: 08/Feb/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Bug Priority: Major - P3
Reporter: Amirsaman Memaripour Assignee: Louis Williams
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Execution Team 2021-02-22
Participants:
Linked BF Score: 13

 Description   

The assertMetricsExist function in profile_operation_metrics.js relies on the elapsed time measured by OperationCPUTimer, and expects a non-zero value when measuring execution time of database operations (see here).

However, the OS clock source is only expected to return monotonically increasing values, and might retain its value for a period of time. The following repro is an example of such cases, where not only the reported elapsed time is zero, the timestamps captured by both mongod logger and resmoke indicate that the time is not advancing:

[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 [jsTest] ----
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  CONTROL  7777701 [conn1] "Started the timer","attr":{"timer":140017349984296}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  CONTROL  7777705 [conn1] "Got elapsed","attr":{"timer":140017349984296,"elapsedNanos":0}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  COMMAND  51803   [conn11] "Slow query","attr":{"type":"command","ns":"local.oplog.rs","appName":"OplogFetcher","command":{"getMore":2008406674905367600,"collection":"oplog.rs","batchSize":13981010,"maxTimeMS":5000,"term":1,"lastKnownCommittedOpTime":{"ts":{"$timestamp":{"t":1610736438,"i":3}},"t":1},"$db":"local","$replData":1,"$oplogQueryData":1,"$readPreference":{"mode":"secondaryPreferred"},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1610736438,"i":3}},"signature":{"hash":{"$binary":{"base64":"0pIxOg35PPB2fjfkjtAnD4HOq5Q=","subType":"0"}},"keyId":6918060319390564000}}},"originatingCommand":{"find":"oplog.rs","filter":{"ts":{"$gte":{"$timestamp":{"t":1610736438,"i":2}}}},"ntoreturn":13981010,"tailable":true,"awaitData":true,"term":1,"maxTimeMS":60000,"readConcern":{"level":"local","afterClusterTime":{"$timestamp":{"t":0,"i":1}}},"$db":"local","$replData":1,"$oplogQueryData":1,"$readPreference":{"mode":"secondaryPreferred"},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1610736438,"i":3}},"signature":{"hash":{"$binary":{"base64":"0pIxOg35PPB2fjfkjtAnD4HOq5Q=","subType":"0"}},"keyId":6918060319390564000}}},"planSummary":"COLLSCAN","cursorid":2008406674905367600,"keysExamined":0,"docsExamined":1,"numYields":2,"nreturned":1,"reslen":811,"locks":{"Global":{"acquireCount":{"r":3}},"Mutex":{"acquireCount":{"r":1}}},"readConcern":{"level":"local","afterClusterTime":{"$timestamp":{"t":0,"i":1}},"provenance":"clientSupplied"},"storage":{},"remote":"10.122.0.229:32975","protocol":"op_msg","durationMillis":0}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  WRITE    51803   [conn1] "Slow query","attr":{"type":"remove","ns":"profile_operation_metrics.coll","appName":"MongoDB Shell","command":{"q":{"_id":1},"limit":0},"planSummary":"IDHACK","keysExamined":1,"docsExamined":1,"ndeleted":1,"keysDeleted":1,"numYields":0,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":1}},"ReplicationStateTransition":{"acquireCount":{"w":1}},"Global":{"acquireCount":{"w":1}},"Database":{"acquireCount":{"w":1}},"Collection":{"acquireCount":{"w":1}},"Mutex":{"acquireCount":{"r":1}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":2},"storage":{},"operationMetrics":{"docBytesRead":29,"docUnitsRead":1,"idxEntryBytesRead":3,"idxEntryUnitsRead":1,"cursorSeeks":3,"docBytesWritten":29,"docUnitsWritten":1,"idxEntryBytesWritten":3,"idxEntryUnitsWritten":1},"remote":"127.0.0.1:56621","durationMillis":0}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  CONTROL  7777704 [conn1] "Got elapsed","attr":{"timer":140017349984296,"elapsedNanos":0}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  CONTROL  7777702 [conn1] "Stopped the timer","attr":{"timer":140017349984296}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  CONTROL  7777705 [conn1] "Got elapsed","attr":{"timer":140017349984296,"elapsedNanos":0}
[js_test:profile_operation_metrics] 2021-01-15T18:47:19.863+0000 d20021| 2021-01-15T18:47:19.862+00:00 I  COMMAND  51803   [conn1] "Slow query","attr":{"type":"remove","ns":"","appName":"MongoDB Shell","numYields":0,"locks":{"ParallelBatchWriterMode":{"acquireCount":{"r":2}},"ReplicationStateTransition":{"acquireCount":{"w":3}},"Global":{"acquireCount":{"r":1,"w":2}},"Database":{"acquireCount":{"w":2}},"Collection":{"acquireCount":{"w":2}},"Metadata":{"acquireCount":{"W":1}},"Mutex":{"acquireCount":{"r":2}}},"flowControl":{"acquireCount":1,"timeAcquiringMicros":2},"storage":{},"operationMetrics":{"docBytesRead":29,"docUnitsRead":1,"idxEntryBytesRead":3,"idxEntryUnitsRead":1,"cursorSeeks":3,"docBytesWritten":1445,"docUnitsWritten":13,"idxEntryBytesWritten":3,"idxEntryUnitsWritten":1},"remote":"127.0.0.1:56621","durationMillis":0}

We should relax this constraint so that a non-advancing clock would not cause failures.

If the value retrieved from OperationCPUTimer is required to be always increasing, we need another ticket to investigate changes to the implementation so that it provides a stronger guarantee than what is provided by the OS.



 Comments   
Comment by Githook User [ 08/Feb/21 ]

Author:

{'name': 'Louis Williams', 'email': 'louis.williams@mongodb.com', 'username': 'louiswilliams'}

Message: SERVER-54144 Relax timing constraint in profile_operation_metrics.js
Branch: master
https://github.com/mongodb/mongo/commit/29350aece6a2a0d55de8f63a1da97b2ae6c1294e

Generated at Thu Feb 08 05:32:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.