-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.10, 3.6.2
-
Component/s: Shell
-
Labels:None
-
ALL
I've been writing a script to browse a collection with an invalid (bad design) data structure and rewrite it in another collection, and I encounter a massive memory leak when running the script in mongo shell.
Here is the sample script (I removed all that is not required and renamed variables):
var srcCollection = db.getCollection('source'), dstCollection = db.getCollection('destination'), updates = [], batchSize = 200, counter = 0, limit = 0, flush = true, bulkOptions = {"writeConcern": {"w": 1}, "ordered": false}, cursor = srcCollection.find({}).batchSize(batchSize).noCursorTimeout(), c = null, sl = 0, ol = 0, cl = 0, newDoc = null, n1, n2, n3; if (limit > 0) cursor.limit(limit); if (flush) { dstCollection.drop(); printjson('Collection ' + dstCollection + ' has been dropped before processing...'); } cursor.forEach(function (doc) { // Copy source document newDoc = Object.assign({}, doc); // Remove obsolete keys delete newDoc['mbz']; delete newDoc['_id']; newDoc['mbz'] = { 'field1': [], 'field2': [], 'field3': [] }; if (doc.mbz) { doc.mbz.forEach(function (elt) { c = elt.c_id; sl = elt.sds.length; for (n1 = 0;n1 < sl; n1++){ ol = elt.sds[n1]['o'].length; cl = elt.sds[n1]['c'].length; newDoc['mbz']['field1'].push({ 'c': c, 'd': elt.sds[n1].d, 'x': elt.sds[n1].s_id }); for (n2 = 0;n2 < ol; n2++) { newDoc['mbz']['field2'].push({ 'c': c, 'd': elt.sds[n1]['o'][n2].d, 'x': elt.sds[n1].s_id }); } for (n3 = 0;n3 < cl; n3++) { newDoc['mbz']['field3'].push({ 'c': c, 'd': elt.sds[n1]['c'][n3].d, 'l': elt.sds[n1]['c'][n3].l_id, 'x': elt.sds[n1].s_id }); } } }); } updates.push({ 'insertOne':{ "document" : newDoc } }); counter++; if (updates.length >= batchSize) { // I tried bulkWrite, insertMany and initializeUnorderedBulkOp too dstCollection.bulkWrite(updates, bulkOptions); printjson('-- ' + counter + ' documents transfered so far...'); updates = []; } }); if (updates.length > 0) { dstCollection.bulkWrite(updates, bulkOptions); } printjson('----- Total: ' + counter + ' documents transfered');
This is rather bruteforce, but it is meant to be done only once and was meant to be written quickly. Also, this works perfectly (and faster) in Python with Pymongo 3.4 or 3.6.
Now, the script is leaking only if bulkWrite operations are done. If I comment the lines ( dstCollection.bulkWrite(updates, bulkOptions); ), no write operation is done and I have no memory leak, even if the cursor is browsed until the end.
The collection is rather small (16000 docs) but the documents have an average size of 21K (source collection is 540Mo large, destination collection built using python is 330Mo). The leak is growing after each bulkWrite (about every 3 seconds), and adds 15 to 25Mo of memory to the "mongo" process (not MongoDB).