Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-92904

Reply size exceeds BSONObjMaxInternalSize whilst batch is within BSONObjMaxUserSize

    • Query Execution
    • Fully Compatible
    • ALL
    • v8.0, v7.0, v6.0
    • Hide

      The problem can be reproduced on v7.0 and on the current master with the following script:

      (function() {
      "use strict"
      
      const kSecondDocSize = 80 * 1024;
      // Here, 4.5 is some "unlucky" (non-unique) weight to provoke an error.
      const kFirstDocSize = (16 * 1024 * 1024) - (4.5 * kSecondDocSize);
      
      const testDB = db.getSiblingDB(jsTestName());
      const testCol = testDB["test"];
      
      const csCursor = testCol.watch();
      
      testCol.insertMany([{a: "x".repeat(kFirstDocSize)}, {_id: "x".repeat(kSecondDocSize)}]);
      jsTestLog(tojson(Object.bsonsize(csCursor.next())));
      jsTestLog(tojson(Object.bsonsize(csCursor.next())));
      })();
      
      Show
      The problem can be reproduced on v7.0 and on the current master with the following script: (function() { "use strict" const kSecondDocSize = 80 * 1024; // Here, 4.5 is some "unlucky" (non-unique) weight to provoke an error. const kFirstDocSize = (16 * 1024 * 1024) - (4.5 * kSecondDocSize); const testDB = db.getSiblingDB(jsTestName()); const testCol = testDB[ "test" ]; const csCursor = testCol.watch(); testCol.insertMany([{a: "x" .repeat(kFirstDocSize)}, {_id: "x" .repeat(kSecondDocSize)}]); jsTestLog(tojson( Object .bsonsize(csCursor.next()))); jsTestLog(tojson( Object .bsonsize(csCursor.next()))); })();
    • QE 2024-09-30, QE 2024-10-14, QE 2024-10-28

      The generateBatch() method in getmore_cmd.cpp code link allows creating a result batch BSON object of size 16 757 334 containing 28099 documents which is still within BSONObjMaxUserSize (16MB = 16 777 216), however, the reply object containing that batch is 16 798 526 which exceeds BSONObjMaxInternalSize (16MB + 16KB = 16 793 600) and so the query fails with BSONObjectTooLarge error.

      The problem behind this issue is a large postBatchResumeToken (41 147 bytes) which is unaccounted when estimating the reply size. Change event resume tokens can be large (larger than 16KB "buffer" we accommodate in BSONObjMaxInternalSize), because they encode the document ID, which can also be large. This error is a bug since we are controlling how big batches can grow, and the proper solution in this case (without any error) is to go for a smaller batch, e.g., stop on 28098 documents.

      The problem has manifested in mongosync internal test running the current v7.0 with enableTestCommands=1. The failure call stack:

      #0  BSONObj::_assertInvalid() at bsonobj.cpp:94
      #1  BSONObj::init<mongo::BSONObj::DefaultSizeTrait> () at bsonobj.h:720
      #2  BSONObj::BSONObj<mongo::BSONObj::DefaultSizeTrait> () at bsonobj.h:144
      #3  BSONObjBuilderBase<mongo::BSONObjBuilder, mongo::BufBuilder>::done<mongo::BSONObj::DefaultSizeTrait> () at bsonobjbuilder.h:579
      #4  BSONObjBuilder::obj<mongo::BSONObj::DefaultSizeTrait> () at bsonobjbuilder.h:775
      #5  BSONObj::removeField () at bsonobj.cpp:724
      #6  (anonymous namespace)::GetMoreCmd::Invocation::validateResult () at getmore_cmd.cpp:800
      #7  (anonymous namespace)::GetMoreCmd::Invocation::run () at getmore_cmd.cpp:792
      ...
      

      If test commands are not enabled it will fail somewhere else.

            Assignee:
            romans.kasperovics@mongodb.com Romans Kasperovics
            Reporter:
            romans.kasperovics@mongodb.com Romans Kasperovics
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: