[SERVER-8456] Mongod memory leak during MapReduce in 2.2.x Created: 06/Feb/13  Updated: 15/Nov/21  Resolved: 01/Apr/13

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.2.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dan Cooper Assignee: Tad Marshall
Resolution: Incomplete Votes: 0
Labels: memory-leak
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6.2


Attachments: JPEG File mongod_mem.jpg    
Issue Links:
Depends
Related
is related to SERVER-8442 Map-reduce memory leak Closed
is related to SERVER-8655 MapReduce leaks ClientCursors Closed
Operating System: Linux
Participants:

 Description   

During map reduce jobs and index drop/creates we are seeing memory consistantly increase over a period of time until mongod finally runs out and kills the process with kernel: Out of memory: Kill process 13794 (mongod) score 908 or sacrifice child. We have not had these issues prior to 2.2.x



 Comments   
Comment by Tad Marshall [ 27/Mar/13 ]

So far, we have not been able to reproduce the memory leak internally; there may be details in the specific usage of MapReduce that need to be investigated to resolve this.

We'll switch to a private ticket to try to learn what specifics in this case may be at the root of the memory leak.

If anyone reading this has a reproducible case that they can share with us, please post the details here.

Comment by Dan Cooper [ 26/Mar/13 ]

Tad - Yes the JS file is what I uploaded, that's what seems to be causing the leak from what my developers tell me. I can ask if we can do a mongodump to repro the data.

I'm actually in your building for training today, i'm in the back of the training room if you want to discuss in person.

Dan

Comment by Tad Marshall [ 26/Mar/13 ]

Hi Dan,

Sorry this has dragged on so long.

There are some comments early in this ticket about uploading files with scp, but no notes saying what was uploaded. I found a JavaScript file ... is that all we have to go on?

We can change this to a private ticket if you prefer, but our attempts to reproduce your problem with dummy data have not succeeded. I think we need real data to dupe and debug this.

Tad

P.S. Version 2.2.4-rc0 will be out soon, and it includes a bug fix that seems too small to account for the leaks you are seeing, but you could test it and let us know.

Comment by Dan Cooper [ 26/Mar/13 ]

Do you need anything from me for more data points? Our mongod's fail daily now and we had to write a puppet script to restart them automatically. Clearly this does not show confidence in the product to the team.

Comment by auto [ 21/Mar/13 ]

Author:

{u'date': u'2013-02-12T20:32:24Z', u'name': u'Ben Becker', u'email': u'ben.becker@10gen.com'}

Message: SERVER-8456: use CursorHolder to avoid auto_ptr::release
Branch: v2.2
https://github.com/mongodb/mongo/commit/2048a8099765766aa772592ecb0e5a6d54530287

Comment by Tad Marshall [ 20/Mar/13 ]

Switching from SpiderMonkey in version 2.2 to V8 in version 2.4 changed the leak situation in both good and bad ways. We've been able to fix many of the new sources in our V8 code, but the work is ongoing. We don't at this point have concrete advice on what you should change in your MapReduce code to prevent or eliminate the leaks, but if you wanted to try version 2.4.0 we would be very interested in what you learn.

We've fixed the cases that we've been able to reliably reproduce, but we don't think that we've fixed everything yet, so more data points would be helpful.

Comment by Dan Cooper [ 19/Mar/13 ]

Hey there, was this fixed in 2.4? Never got an answer to my previous question.

Comment by Dan Cooper [ 02/Mar/13 ]

Hi guys, could we find out what about our map/reduce jobs is causing the leak so maybe we can fix it?

Comment by Dan Cooper [ 21/Feb/13 ]

Yes the only server process running on those boxes are mongod. We run mongos on all the client nodes calling the mongod's.

Comment by Ben Becker [ 21/Feb/13 ]

Hi Dan,

Just a quick update. I've moved the slow ClientCursor leak issue into SERVER-8655. That ticket is a candidate for being backported to the v2.2 branch. In this ticket, we will continue to debug any remaining leaks.

I also wanted to ask – in the attached graph, is the only process running mongod? Do you run any mongos instances on the same node?

Thanks,
Ben

Comment by auto [ 12/Feb/13 ]

Author:

{u'date': u'2013-02-12T20:32:24Z', u'name': u'Ben Becker', u'email': u'ben.becker@10gen.com'}

Message: SERVER-8456: use CursorHolder to avoid auto_ptr::release
Branch: master
https://github.com/mongodb/mongo/commit/74e487e24e4120c55436977e0b5972ac38efbab7

Comment by Ben Becker [ 12/Feb/13 ]

Hi Dan,

We're still diagnosing the leak you reported. The patch that I intended to mark for backport does fix a (very) slow cursor leak, but I do not believe this will fix the issue you reported. Apologies for any confusion – we'll keep you posted as we make progress.

Best,
Ben

Comment by Dan Cooper [ 11/Feb/13 ]

I noticed you updated this to have a backport, do you know when that will be available for 2.2.2?

Comment by Dan Cooper [ 09/Feb/13 ]

1. 2.2.2 though we have seen this in 2.2.0
2. Yes
3. Yes
4. The indexes are recreated as part of the script that runs the M/R job. It recreates the indexes after each run of the job (it runs every 15 minutes). It runs an EnsureIndex in the foreground on the output collections from the M/R job.
5. Can you supply the private SCP?

Comment by Ben Becker [ 07/Feb/13 ]

Hi Dan,

Just a few questions that will help track this down:

  1. What version are you running?
  2. Are you running with sharding?
  3. Are you running with replication?
  4. Could you briefly explain how you are creating and dropping indexes?
  5. Can you supply a copy of the MapReduce scripts you're running, or a minimal script that reproduces the leak?

I would be happy to supply a private SCP server to upload the scripts if there are any privacy concerns. Otherwise please feel free to attach them to this ticket.

Thanks!

Generated at Thu Feb 08 03:17:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.