[SERVER-4710] Writers queues never go down and server eventually stalls Created: 18/Jan/12  Updated: 15/Aug/12  Resolved: 02/Mar/12

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: YannPierre CouzySchwartz Assignee: Eric Milkie
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows x64


Issue Links:
Depends
Operating System: Windows
Participants:

 Description   

We've posted on mongo-users what we believe is our core issue right now, namely the queued writers which never seem to go down once they've reached a threshold.

The thread is here: http://groups.google.com/group/mongodb-user/browse_thread/thread/195264a598d87393#

We've been able to reproduce the problem yesterday. After a --repair, the server seemed fine for a while, and when copying on the server the log file (the -v switch proved itself very verbose indeed, we had over 2GB of logs for a few hours of activity), the qw figure shot up drastically, as did the inbound connections (the clients were trying to compensate for the blocking queries), right before the server stopped serving queries altogether.

I've uploaded on jira part of the logs for this episode...

Later on, when we issued a repair on the database, the log was

Tue Jan 17 00:09:45 [initandlisten] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 220, a
ctive: true, waitingForLock: false, secs_running: 0, op: "getmore", ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc: "initandli
sten", numYields: 0 }
Tue Jan 17 00:09:46 [initandlisten] warning: ClientCursor::yield can't unlock b/c of recursive lock ns: Main.cachedItems top: { opid: 221, a
ctive: true, waitingForLock: false, secs_running: 0, op: "getmore", ns: "Main.cachedItems", query: {}, client: "0.0.0.0:0", desc: "initandli
sten", numYields: 0 }

(and so on for the 223 stuck queued writers).

So it looks like some of our queries (we're implementing a poor-man distributed lock using atomic sets and checks) may be problematic when mongod is under duress (IO starvation, or other factors). We're moving the lock mechanism to a dedicated instance, and we're working more generally on our data access profile to minimize writes wrt reads, but we need to find out the root causes for this incident.



 Comments   
Comment by Yann Schwartz [ 02/Mar/12 ]

Sorry for not giving updates. We've migrated to 2.1.0 (2008+) and the locking problem has never reappeared. We later switched to 2.0.3 and our issue is fixed for this version as well. Thank you.

Comment by Eric Milkie [ 28/Feb/12 ]

How's your system running now with the Windows2008R2 build?

Comment by Eric Milkie [ 06/Feb/12 ]

I tried using a build of 2.0.3 that is very close to the one you are using and still can't reproduce. Since I believe the code is essentially the same, the cause of the issue may be either the hardware, or the format or content of your data.

What parameters are you using to start mongod? I did try raising the verbosity level on mine to see if the extra info would help, but unfortunately it logs every insert during the import, and thus slowed it down significantly.
Do you see anything abnormal in the mongod log, compared to what is shown for the import using 2.0?

Other than this mongorestore issue, is it correct to say that it is running fine with the new Windows2008+ build?

Comment by YannPierre CouzySchwartz [ 06/Feb/12 ]

Mongorestore with --host --db and -c switches (I've tried it with and without --drop)
The file to restore is on the same drive.

The binary we tested was compiled on Jan 30th

Running mongod we get:
Mon Feb 06 16:42:45 [initandlisten] db version v2.0.3-pre-, pdfile version 4.5
Mon Feb 06 16:42:45 [initandlisten] git version: ab1e828dd777945a0afe67c66c37ca2deb2c6ee1
Mon Feb 06 16:42:45 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_42

Comment by Eric Milkie [ 06/Feb/12 ]

I tried dumping a 300MB database with 2.0.2 and then restored it with mongorestore using a Nightly build of 2.1.1/Windows 2008. The restore completes in 1 min 15 secs. There may be a regression in the specific build you are using. Can you give the initial log messages when you start mongod? Within the messages is the "git version" and "db version" which will point me to the specific build you are using.
Are you using any command line parameters to mongorestore, other than the path to the dump?

Comment by YannPierre CouzySchwartz [ 06/Feb/12 ]

Restoring on a 2.0.2 then running the latest binary pointing to the data directory works fine though. The issue seems to be with the way mongorestore works in the latest build you provided.

Comment by YannPierre CouzySchwartz [ 06/Feb/12 ]

We did a few tests with the 2008R2 build, but when we tried to use mongorestore to restore our data, the restore process was very very slow (less than 10MB a minute, after 15 minutes it was still showing 0% progress)... We tested this on a 2GB and a 6GB dumps.

The restore on the same backup files is very fast with the current 2.0.2.

Comment by Eliot Horowitz (Inactive) [ 23/Jan/12 ]

Which build was that with in production?

CAn you switch to the 2008R2+ build?

Comment by YannPierre CouzySchwartz [ 23/Jan/12 ]

We haven't been able to reproduce the problem on our test setups. But we have in production. The relevant logs have been uploaded in the community private jira.

Comment by YannPierre CouzySchwartz [ 18/Jan/12 ]

Mmh. The nightly build requires 2008 R2 (TryAcquireSRWLockExclusive), I'll need to switch the OS before testing, more info when everything is set up (a few hours from now).

Comment by YannPierre CouzySchwartz [ 18/Jan/12 ]

Thank you Eliot. We'll try to reproduce on our current dev setup (2.0.2) then we'll try out your build.

Comment by Eliot Horowitz (Inactive) [ 18/Jan/12 ]

There is a newer windows build that might help.
Can you try this:
http://downloads.mongodb.org/win32/mongodb-win32-x86_64-2008plus-v2.0-latest.zip

Generated at Thu Feb 08 03:06:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.