[SERVER-57011] DocumentStorage caches nested objects for each level of nesting Created: 17/May/21  Updated: 30/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nicholas Zolnierz Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File cache.patch    
Issue Links:
Related
is related to SERVER-71824 Estimate error and speed of doc.getAp... Closed
is related to SERVER-61281 Fix underflow when accounting for Doc... Closed
Assigned Teams:
Query Execution
Operating System: ALL
Steps To Reproduce:

TEST(DocumentSerialization, ApproximateSizeForNestedDocuments) {
    std::string largeStr(1024, 'x');
    auto bsonDoc = BSON("obj" << BSON("subObj" << BSON("subObjSubObj" << largeStr)));
    auto doc = Document(bsonDoc);
    ASSERT_GT(doc.getApproximateSize(), 1024);
    ASSERT_LT(doc.getApproximateSize(), 1024 * 2);
 
    // Force 'obj.subObj.subObjSubObj' to be cached.
    ASSERT_VALUE_EQ(doc.getNestedField("obj.subObj.subObjSubObj"), Value(largeStr));
 
    // largeStr is cached, so expect roughly double the footprint.
    ASSERT_GT(doc.getApproximateSize(), 1024 * 2);
    ASSERT_LT(doc.getApproximateSize(), 1024 * 3);  <--- This one fails, on my machine the reported size is 4892
}

Sprint: Query Execution 2021-06-28, Query Execution 2021-07-12, QE 2022-04-04, QE 2022-04-18, QE 2022-05-02, QE 2022-05-16, QE 2022-05-30, QE 2022-06-13, QE 2022-06-27, QE 2022-07-11, QE 2022-07-25, QE 2022-08-08, QE 2022-08-22
Participants:

 Description   

When accessing a field in a Document, it's expected that the internal caching will add some overhead to the memory footprint. However when the accessed path contains nested documents, it appears that the reported size double counts the values in the sub-objects. The impact may not be incredibly severe, given that it has "approximate" in the method name, but there are several aggregation stages that rely on this size to decide whether to spill to disk.



 Comments   
Comment by Kyle Suarez [ 09/Jul/21 ]

I am sending this back to the Triage Queue for consideration.

Comment by Kyle Suarez [ 25/May/21 ]

Sending back to the triage queue now that Mihai has investigated an RCA.

Generated at Thu Feb 08 05:40:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.