[SERVER-5533] fsync and lock on a Secondary causes the shell to freeze/hang - worse with auth enabled Created: 06/Apr/12  Updated: 23/Feb/15  Resolved: 11/Jun/12

Status: Closed
Project: Core Server
Component/s: Shell
Affects Version/s: 2.0.3, 2.0.4, 2.1.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Adam Comerford Assignee: Andy Schwerin
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

mongod 2.0.4 - Mac OSX test env


Issue Links:
Depends
Duplicate
duplicates SERVER-1423 reads often aren't possible while in ... Closed
Related
Operating System: ALL
Participants:

 Description   

Reported here originally:

https://groups.google.com/forum/?fromgroups#!topic/mongodb-user/7o_bYPUJuSU

This has been recreated with and without authentication. The keyfile/auth version of the bug is much more painful because there is no way to remove the lock and get out of the bad state without sending kill -9 to mongod. With a non-authenticated set up it is possible to issue the unlock command.

Repro steps below.



 Comments   
Comment by Andy Schwerin [ 14/May/12 ]

QLock improvements did not resolve this bug. Looks like it needs more direct examination.

Comment by auto [ 09/May/12 ]

Author:

{u'login': u'andy10gen', u'name': u'Andy Schwerin', u'email': u'schwerin@10gen.com'}

Message: Fix remap private view memory leak and formalize w->X ugprade process.

Makes the upgrade R_to_W() block until it succeeds.

Replaces "runExclusively" with a new "X" state, which is equivalent to "W", but
can only be reached from "w". Client code in "w" calls "w_to_X()", which blocks
until all other threads in "w" call w_to_X() or call unlock_w(). At that point,
exactly one thread returns from w_to_X() with return value "true". This thread
is the "exclusive writer", and may behave like it's in "W" state. When that
thread calls "X_to_w()", it reverts to "w" state, and releases all the other
threads that called w_to_X() back into "w" state.

Because X_to_w() is effectively a barrier, we use generation counters to make
sure that fast racers don't cause deadlocks or race through the barrier. The
generation counters are generationX and generationXExit.

Use of w_to_X() is wrapped in the Lock::DBWrite::UpgradeToExclusive() guard
object, which has a "gotUpgrade()" method to check if a particular thread was
the exclusive worker for the upgrade period.

Decision making about which condition variables to notify is more fully
delegated to the notifyWeUnlocked() method of QLock, to eliminate some
inadvertent deadlocks due that surfaced when R_to_W() was made to wait until
success.

May help SERVER-5533, because it fixes a greediness logic bug exercised by
fsync-and-lock.

This patch also introduces a directed test of the w->X functionality,
though it could use the addition of of extra "noise" work.
Branch: master
https://github.com/mongodb/mongo/commit/15770de1f88984d5dc54650061d340fbaa1b0586

Comment by Adam Comerford [ 06/Apr/12 ]

Updated affects version - reproduced on 2.1.1

Comment by Adam Comerford [ 06/Apr/12 ]

To reproduce:

Create a replica set, similar to this one:

 
{
	"_id" : "testing",
	"version" : 8,
	"members" : [
		{
			"_id" : 0,
			"host" : "adamc-mbp.local:30001"
		},
		{
			"_id" : 1,
			"host" : "adamc-mbp.local:30002",
			"priority" : 0,
			"hidden" : true
		},
		{
			"_id" : 2,
			"host" : "adamc-mbp.local:30003"
		}
	]
}

Start mongod with the --keyFile option to enable auth. I have created the following user in the tests:

> use admin
> db.addUser("admin", "password")

Now, fsync and lock the hidden secondary, as if for a snapshot:

./mongo --port 30002
MongoDB shell version: 2.0.4
connecting to: 127.0.0.1:30002/test
> use admin
switched to db admin
> db.auth("admin","password")
1
SECONDARY> db.fsyncLock()
{
	"info" : "now locked against writes, use db.fsyncUnlock() to unlock",
	"seeAlso" : "http://www.mongodb.org/display/DOCS/fsync+Command",
	"ok" : 1
}

Disconnect and reconnect again. Try auto-completing db. commands - this is usually enough to cause the hang. If not, run "show dbs" the shell simply stops responding - Ctrl-C and similar have no effect.

Once it hangs, a standard shell cannot be connected. You can get a shell as follows though, however once you try to auth (in order to unlock) it freezes again:

./mongo --port 30002 --shell --eval "prompt='> '"
MongoDB shell version: 2.0.4
connecting to: 127.0.0.1:30002/test
type "help" for help
> 
> use admin
switched to db admin
> db.auth("admin", "password")
<freeze>

In terms of the mongod process, even Ctrl-C/SIGINT can't break out:

^CFri Apr  6 16:08:54 got kill or ctrl c or hup signal 2 (Interrupt: 2), will terminate after current cmd ends

Finally, it is possible to recreate the hang with authentication disabled. Thankfully, with authentication disabled, it is still possible to run the unlock command and restore functionality.

Generated at Thu Feb 08 03:09:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.