Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-93152

$push and $addToSet memory limits can cause avoidable failures

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • Query Execution
    • ALL

      $push and $addToSet will uassert() if they appear to consume memory beyond their limit. We track memory usage by summing getApproximateSize() for each Value in the pipeline. For some queries, the memory usage can be exaggerated because the materialized document is smaller than the objects that pass from stage to stage.

      In particular, in DocumentStorage objects we track the size of the backing bson object (_bson) as well as the result cache (_cache). These objects often share pointers, resulting in a smaller actual memory use, but for purposes of approximating their size we treat them as distinct copies. In addition, sometimes the materialized object is significantly smaller than the original _bson, resulting in exaggerated approximations.

      We can observe this with a fairly simple pipeline:

      [
        {
          "$unwind": {
            "path": "$foo"
          }
        },
        {
          "$group": {
            "_id": null,
            "Data": {
              "$push": "$$ROOT"
            }
          }
        }
      ]
      

      Where an input document might look something like:
           {_id:1, foo: [array with 1000s of elements]}

      Each document's backing _bson will contain the full document will all elements in foo, but the materialized document will only contain1 element.

      We can in effect force materialization by including a $project stage following the $unwind stage, which mitigates the problem:

      [
        {
          "$unwind": {
            "path": "$foo"
          }
        },
        {
          "$project": {
            "_id": 1,
            "foo": 1
          }
        },
        {
          "$group": {
            "_id": null,
            "Data": {
              "$push": "$$ROOT"
            }
          }
        }
      ]
      

      The $project stage forces us to discard the _bson objects and write all fields into the result _cache. This results in substantially smaller approximate size for the $group/$push stage.

      Perhaps we can implement a forced materialization for cases where we would otherwise exceed memory limits or spill to disk?

      Note that I believe SBE always materializes between stages, so this problem does not exist when SBE is enabled.

            Assignee:
            Unassigned Unassigned
            Reporter:
            colin.stolley@mongodb.com Colin Stolley
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: