Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33845

Important memory leak in bulkWrite via mongo shell

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 3.4.10, 3.6.2
    • Component/s: Shell
    • Labels:
      None
    • ALL

      I've been writing a script to browse a collection with an invalid (bad design) data structure and rewrite it in another collection, and I encounter a massive memory leak when running the script in mongo shell.

      Here is the sample script (I removed all that is not required and renamed variables):

      var srcCollection = db.getCollection('source'),
          dstCollection = db.getCollection('destination'),
          updates = [],
          batchSize = 200,
          counter = 0,
          limit = 0,
          flush = true,
          bulkOptions = {"writeConcern": {"w": 1}, "ordered": false},
          cursor = srcCollection.find({}).batchSize(batchSize).noCursorTimeout(),
          c = null,
          sl = 0,
          ol = 0,
          cl = 0,
          newDoc = null,
          n1, n2, n3;
      
      if (limit > 0)
          cursor.limit(limit);
      
      if (flush) {
          dstCollection.drop();
          printjson('Collection ' + dstCollection + ' has been dropped before processing...');
      }
      
      cursor.forEach(function (doc) {
          // Copy source document
          newDoc = Object.assign({}, doc);
          // Remove obsolete keys
          delete newDoc['mbz'];
          delete newDoc['_id'];
      
          newDoc['mbz'] = {
              'field1': [],
              'field2': [],
              'field3': []
          };
      
          if (doc.mbz) {
              doc.mbz.forEach(function (elt) {
                  c = elt.c_id;
                  sl = elt.sds.length;
      
                  for (n1 = 0;n1 < sl; n1++){
      
                      ol = elt.sds[n1]['o'].length;
                      cl = elt.sds[n1]['c'].length;
      
                      newDoc['mbz']['field1'].push({
                          'c': c,
                          'd': elt.sds[n1].d,
                          'x': elt.sds[n1].s_id
                      });
      
                      for (n2 = 0;n2 < ol; n2++) {
                          newDoc['mbz']['field2'].push({
                              'c': c,
                              'd': elt.sds[n1]['o'][n2].d,
                              'x': elt.sds[n1].s_id
                          });
                      }
      
                      for (n3 = 0;n3 < cl; n3++) {
                          newDoc['mbz']['field3'].push({
                              'c': c,
                              'd': elt.sds[n1]['c'][n3].d,
                              'l': elt.sds[n1]['c'][n3].l_id,
                              'x': elt.sds[n1].s_id
                          });
                      }
                      
                  }
              });
          }
      
          updates.push({
              'insertOne':{
                  "document" : newDoc
              }
          });
      
          counter++;
      
          if (updates.length >= batchSize) {
              // I tried bulkWrite, insertMany and initializeUnorderedBulkOp too
              dstCollection.bulkWrite(updates, bulkOptions);
              printjson('-- ' + counter + ' documents transfered so far...');
              updates = [];
          }
      });
      
      if (updates.length > 0) {
          dstCollection.bulkWrite(updates, bulkOptions);
      }
      
      printjson('----- Total: ' + counter + ' documents transfered');	
      

      This is rather bruteforce, but it is meant to be done only once and was meant to be written quickly. Also, this works perfectly (and faster) in Python with Pymongo 3.4 or 3.6.

      Now, the script is leaking only if bulkWrite operations are done. If I comment the lines ( dstCollection.bulkWrite(updates, bulkOptions); ), no write operation is done and I have no memory leak, even if the cursor is browsed until the end.

      The collection is rather small (16000 docs) but the documents have an average size of 21K (source collection is 540Mo large, destination collection built using python is 330Mo). The leak is growing after each bulkWrite (about every 3 seconds), and adds 15 to 25Mo of memory to the "mongo" process (not MongoDB).

            Assignee:
            dmitry.agranat@mongodb.com Dmitry Agranat
            Reporter:
            hmducoulombier@marketing1by1.com Henri-Maxime Ducoulombier
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: