[SERVER-1094] mongostat completly locks under high load Created: 06/May/10  Updated: 24/May/10  Resolved: 24/May/10

Status: Closed
Project: Core Server
Component/s: Admin
Affects Version/s: 1.4.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Ryan Nitz Assignee: Eliot Horowitz (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux FC8


Participants:

 Description   

When the MongoDB has a high load and lots of connections, mongostat locks after connecting. Accessing the shell still works.

This isn't the incremental lock I saw earlier, the command does not return any data (just hangs).

Estimated load:

Connections: 2k
Updates per second: ~ 4k



 Comments   
Comment by Eliot Horowitz (Inactive) [ 24/May/10 ]

if you still want to preheat, can you make a separate case for that

Comment by Ryan Nitz [ 12/May/10 ]

Ignore my comment about the mongo CPU spiking when adding connections. There was a bug in our app and we were loading a lot of data on app startup.

Comment by Ryan Nitz [ 07/May/10 ]

Ok... after more investigation, I know what happened.

I did a restore on a database (moved to another machine)

I ran a db.repairDatabase()

I restarted Mongo

At this point the server load average was ~ 7

I added the load above. When initializing a lot of rapid new connections, the Mongo CPU consumption spikes a bit.

The 60 CPUs started interacting with Mongo.

Mongo did not have most of data in core, resulting in a lot of disk access.

The server load average jumped to ~ 12

mongostat started hanging on connect (strange part is that the shell was still working well)

The good news is that I ran the same test with 80 CPUs (db was already all in core) and everything was perfect.

Feature request: The ability to preheat data on startup (e.g., load X bytes worth of data from collections x,y and z before accepting connections).

Comment by Ryan Nitz [ 06/May/10 ]

Well... I am just telling you what I saw on 1.4.2 on an xl EC2 instance (FC8).

FYI - After I reduced the load, the command was working again.

I am going to run some tests again later in the day. I'll attach the info after the tests.

Comment by Eliot Horowitz (Inactive) [ 06/May/10 ]

We've seen it used under much higher load without a problem.
Can you view the web console and/or db.currentOp()?
if So - can you send those.

Generated at Thu Feb 08 02:56:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.