Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57011

DocumentStorage caches nested objects for each level of nesting

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Open
    • Priority: Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: Backlog
    • Component/s: None
    • Labels:
      None
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      TEST(DocumentSerialization, ApproximateSizeForNestedDocuments) {
          std::string largeStr(1024, 'x');
          auto bsonDoc = BSON("obj" << BSON("subObj" << BSON("subObjSubObj" << largeStr)));
          auto doc = Document(bsonDoc);
          ASSERT_GT(doc.getApproximateSize(), 1024);
          ASSERT_LT(doc.getApproximateSize(), 1024 * 2);
       
          // Force 'obj.subObj.subObjSubObj' to be cached.
          ASSERT_VALUE_EQ(doc.getNestedField("obj.subObj.subObjSubObj"), Value(largeStr));
       
          // largeStr is cached, so expect roughly double the footprint.
          ASSERT_GT(doc.getApproximateSize(), 1024 * 2);
          ASSERT_LT(doc.getApproximateSize(), 1024 * 3);  <--- This one fails, on my machine the reported size is 4892
      }
      

      Show
      TEST(DocumentSerialization, ApproximateSizeForNestedDocuments) { std::string largeStr(1024, 'x'); auto bsonDoc = BSON("obj" << BSON("subObj" << BSON("subObjSubObj" << largeStr))); auto doc = Document(bsonDoc); ASSERT_GT(doc.getApproximateSize(), 1024); ASSERT_LT(doc.getApproximateSize(), 1024 * 2);   // Force 'obj.subObj.subObjSubObj' to be cached. ASSERT_VALUE_EQ(doc.getNestedField("obj.subObj.subObjSubObj"), Value(largeStr));   // largeStr is cached, so expect roughly double the footprint. ASSERT_GT(doc.getApproximateSize(), 1024 * 2); ASSERT_LT(doc.getApproximateSize(), 1024 * 3); <--- This one fails, on my machine the reported size is 4892 }
    • Sprint:
      Query Execution 2021-06-28, Query Execution 2021-07-12

      Description

      When accessing a field in a Document, it's expected that the internal caching will add some overhead to the memory footprint. However when the accessed path contains nested documents, it appears that the reported size double counts the values in the sub-objects. The impact may not be incredibly severe, given that it has "approximate" in the method name, but there are several aggregation stages that rely on this size to decide whether to spill to disk.

        Attachments

          Activity

            People

            Assignee:
            backlog-query-execution Backlog - Query Execution
            Reporter:
            nicholas.zolnierz Nicholas Zolnierz
            Participants:
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Dates

              Created:
              Updated: