[SERVER-4153] mongodump hung server Created: 26/Oct/11  Updated: 29/May/12  Resolved: 18/Nov/11

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 1.8.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chris Ferry Assignee: Brandon Diamond
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 5.6 - EC2 m1.xlarge


Attachments: PNG File Screen Shot 2011-11-01 at 12.57.13 PM.png     PNG File Screen Shot 2011-11-01 at 12.57.47 PM.png     PNG File Screen Shot 2011-11-01 at 12.57.47 PM.png     Text File mdb1-102611.txt    
Operating System: Linux
Participants:

 Description   

mongo server was completely unresponsive. Nothing in logs. mongodump was in progress at the time.
Dump file not increasing, iostat showing 0% utilization. Think this is memory usage related.
Restart after kill -9 and it came up cleanly.



 Comments   
Comment by Brandon Diamond [ 10/Nov/11 ]

Not sure about any special args; the only goal there is to find out where the process is hanging.

Comment by Chris Ferry [ 09/Nov/11 ]

Do you have the gdb arguments you want me to use when attaching?
Looking for logs now

Comment by Brandon Diamond [ 01/Nov/11 ]

Thanks for the clarification, Chris.

MongoDB maps all data into virtual memory; as long RSS doesn't grow larger than physical memory, you shouldn't encounter any issues. Do you happen to have the mongod log files available from the time the issue was observed? If the problem occurs again, it'd also be extremely helpful if you could attach GDB to the process and see where the process is waiting ("where").

I also noticed that you're running on 1.8.3 – you should consider upgrading to the latest minor revision (1.8.4) for the latest patches and bugfixes.

Comment by Chris Ferry [ 01/Nov/11 ]

Server that had the issue.

Comment by Chris Ferry [ 01/Nov/11 ]

Primary MongoDB metrics for last day and last week.

Comment by Chris Ferry [ 01/Nov/11 ]

Sorry I was on vacation.
What I'm noticing is a high level of VSS memusage.
example our prod primary:
14511 mongod 15 0 58.6g 8.8g 8.1g S 1.0 58.5 368:29.57 mongod
prod secondaries:
9673 mongod 15 0 57.5g 10g 10g S 0.0 69.2 4:44.79 mongod
11544 mongod 15 0 57.7g 12g 12g R 0.0 85.4 16:31.43 mongod

By unresponsive I mean all queries were timing out and the CLI was not connecting. Finally I tried a kill which failed to work until I kill nined it.

We haven't had any lockups since, but I'm wondering what we can do to assist in troubleshooting if we were to have another.

Comment by Brandon Diamond [ 01/Nov/11 ]

Haven't heard anything for awhile. Has anything changed? Otherwise, we'll close out this ticket tonight.

Comment by Brandon Diamond [ 28/Oct/11 ]

One more thing – can you explain what you mean by "unresponsive"? Can you connect to the server with a separate mongoDB client?

Comment by Brandon Diamond [ 28/Oct/11 ]

Thanks for all the info, Chris.

What does your memory utilization look like over time? In other words, are you running the dump on a busy system with very little available memory? Or is the dump consuming most of the available memory on the system? This definitely looks like a lomem related issue.

Any chance you could hook GDB into the process and find out where the tool is stalling? I'm having trouble reproducing on my end.

Generated at Thu Feb 08 03:05:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.