[SERVER-61281] Fix underflow when accounting for Document size in query memory tracker Created: 05/Nov/21  Updated: 29/Oct/23  Resolved: 11/Aug/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.2, 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Nikita Lapkov (Inactive) Assignee: Nikita Lapkov (Inactive)
Resolution: Fixed Votes: 0
Labels: query-director-triage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Cloners
is cloned by SERVER-69821 Fix underflow error in query memory t... Closed
Depends
Duplicate
is duplicated by SERVER-62856 Fix underflow in query memory tracking Closed
Problem/Incident
Related
related to SERVER-57011 DocumentStorage caches nested objects... Backlog
related to SERVER-62283 Temporary workaround of the problem i... Closed
related to SERVER-65473 Fix another location of memory tracki... Closed
related to SERVER-68297 Document::memUsageForSorter returns a... Closed
related to SERVER-69793 Disable memory underflow check in the... Closed
related to SERVER-69840 Complete TODO listed in SERVER-61281 Closed
related to SERVER-79859 Complete TODO listed in SERVER-61281 Closed
is related to SERVER-62094 Incorrect memory usage calculation in... Closed
is related to SERVER-68381 Investigate usages of Document::getAp... Backlog
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.0, v4.4, v4.2
Sprint: QE 2021-11-15, QE 2021-11-29, QE 2021-12-13, QE 2021-12-27, QE 2022-01-10, QE 2022-04-04, QE 2022-02-07, QE 2022-02-21, QE 2022-03-07, QE 2022-03-21, QE 2022-01-24
Participants:
Linked BF Score: 160

 Comments   
Comment by Nikita Lapkov (Inactive) [ 11/Aug/22 ]

It seems that there were no BFs at least for the moment.

Comment by Githook User [ 03/Aug/22 ]

Author:

{'name': 'Kyle Suarez', 'email': 'kyle.suarez@mongodb.com', 'username': 'ksuarz'}

Message: Revert "SERVER-61281 Use memoization for Document::getApproximateSize"

This reverts commit 112b848eada25cb0161a229592afec5030079d6b.
Branch: master
https://github.com/mongodb/mongo/commit/a36fb0fc2ff3c7cd0784bc9a59721fd34961a5ef

Comment by Rushan Chen [ 03/May/22 ]

One proposal to help tracking when undercounting occurs is to add logging (with exponential backing off if needed) in place of the two original tasserts listed above. 

Comment by Rushan Chen [ 02/May/22 ]

Going to clean up the tassert in the code as we are not working on this yet.

These tasserts are taken out as their behaving correctly depends on this bug being fixed.

https://github.com/10gen/mongo/blob/23cc17b8175a9e4b5e3a2eae8bc317be556cf026/src/mongo/db/pipeline/memory_usage_tracker.h#L59

and
https://github.com/10gen/mongo/blob/23cc17b8175a9e4b5e3a2eae8bc317be556cf026/src/mongo/db/pipeline/memory_usage_tracker.h#L157

 Subsequently this test is also taken out:

https://github.com/10gen/mongo/blob/b5f986b43c70498375dc481aad349c7ceb400175/src/mongo/db/pipeline/memory_usage_tracker_test.cpp#L102

Once the problem is fixed and there is no undercounting of memory usage, both tasserts as well as the test could be restored.

 

Comment by Ana Meza [ 26/Apr/22 ]

steve.la@mongodb.com bernard.gorman@mongodb.com passing this to Director Triage to find a team to work on it

Comment by Ana Meza [ 12/Apr/22 ]

steve.la@mongodb.com would someone in NA have capacity to take this after 6.0rc0?

Comment by Kyle Suarez [ 06/Apr/22 ]

The fix was only meant to be temporary, and I would rather not have this sit in the backlog without attention. I'm flagging this ticket for re-triage and discussion of the scheduling priority.

Comment by Rushan Chen [ 06/Apr/22 ]

Reducing the severity as we have a safety measure checked in. Moving to backlog.

Comment by Ian Boros [ 07/Jan/22 ]

For those curious, this was the temporary fix.

Comment by Nikita Lapkov (Inactive) [ 04/Jan/22 ]

rushan.chen feel free to re-assign this back to me or keep it if you would like to investigate the issue yourself.

Comment by Kyle Suarez [ 15/Dec/21 ]

Sending this ticket to rushan.chen, because Nikita is on holiday. I'm also bumping the severity to P2 - Critical because the related failure is occurring many times.

Generated at Thu Feb 08 05:52:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.