[SERVER-69797] Negative bytes read metric in slow query log lines Created: 19/Sep/22  Updated: 12/Apr/23  Resolved: 12/Apr/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.12
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Khalen Fredieu Assignee: Henrik Edin
Resolution: Cannot Reproduce Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on WT-10066 Possible incorrect values for WT_STAT... Closed
Related
is related to SERVER-70830 Unit test WiredTigerOperationStats Closed
Operating System: ALL
Steps To Reproduce:

Unknown

Sprint: Execution Team 2023-02-20, Execution Team 2023-02-06, Execution Team 2023-03-06, Execution Team 2023-03-20, Execution Team 2023-04-17
Participants:

 Description   

This result was identified while diagnosing a performance issue on Atlas. 

{find: projectDocument filter: {_and: \\\{opportunities.orderInfo.orderNumber: {_exists: true _ne: null}}, \\\{opportunities.orderInfo.cwOrderId: {_exists: false}}} _db: rhdo-gallery-project-service-staging _clusterTime: {clusterTime: {_timestamp: {t: 1645959584 i: 2}} signature: {hash: {_binary: {base64: WK0oU7yTsO1sL5k3qTq5y4OnZrM= subType: 0}} keyId: 7005939129837945000}} lsid: {id: {_uuid: 0c6daeb3-bbe5-4be5-99e3-93cbf21c6563}}} planSummary: IXSCAN
Unknown macro: { opportunities.orderInfo.cwOrderId}
keysExamined: 111623 docsExamined: 109506 cursorExhausted: true numYields: 4993 nreturned: 23 queryHash: 274C58B8 planCacheKey: 1A09723E reslen: 985644 locks: {ReplicationStateTransition: {acquireCount: {w: 4994}} Global: {acquireCount: {r: 4994}} Database: {acquireCount: {r: 4994}} Collection: {acquireCount: {r: 4994}} Mutex:
Unknown macro: {acquireCount}
}} storage: {data: {bytesRead: -1890016436 timeReadingMicros: 158488225}}protocol: op_msg durationMillis: 159718

 

The server was behaving normally, albeit with fairly degraded performance due to the workload. This entry was identified while reviewing the logs for potential index optimizations.

 

 



 Comments   
Comment by Gregory Noma [ 27/Oct/22 ]

For the storage stats, there isn't anywhere obvious in MongoDB that we are transitioning the values through a 32-bit integer. WiredTiger returns its statistics as an unsigned 64-bit integer. In order to be able to serialize these statistics to BSON, we cast this value to a signed 64-bit integer. In order to prevent overflow, during the cast we clamp the value to the maximum value of a signed 64-bit integer. However, we also have the ability to add statistics together, and for this addition we do not do the clamping (and thus overflow is possible). In particular, we use this functionality for reporting the stats from a multi-document transaction. Note that as a signed 64-bit integer, the maximum possible value of this stat is 9223 petabytes – not a value that would be seen in practice.

One potential theory is that WT erroneously returned values for this statistic that were close to or greater than the maximum value of a signed 64-bit integer. Then if this were part of a multi-document transaction, we could have added these values together and overflowed to get the observed value of -1890016436.

Comment by Gregory Noma [ 24/Oct/22 ]

khalen.fredieu@mongodb.com did you only see this occur in this one particular slow query log line or was it repeated?

Generated at Thu Feb 08 06:14:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.