[SERVER-9349] Race condition when fsyncLocking and unlocking can cause the shell to hang on all attempts to unlock Created: 15/Apr/13  Updated: 08/Jul/16  Resolved: 30/Apr/13

Status: Closed
Project: Core Server
Component/s: Concurrency
Affects Version/s: 2.2.4, 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andre de Frere Assignee: Unassigned
Resolution: Duplicate Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-6302 Race condition when multiple fsyncLoc... Closed
Related
Operating System: ALL
Participants:

 Description   

When locking and unlocking from the shell a large number of times, a state can be reached where the shell will hang for every db.fsyncUnlock(). This causes the a situation where the user is unable to unlock.

For example, the following oneliner will eventually cause the shell to hang:

for(x=0;x<100;x++){print(x);printjson(db.fsyncLock());printjson(db.fsyncLock());printjson(db.fsyncUnlock());printjson(db.fsyncUnlock());}

Appears to happen where multiple fsyncLock calls are made (rather than a corresponding fsyncUnlock for every fsyncLock. Anecdotally appears to happen when the locks count would be "zero" (all locks finished), after locking and unlocking several times.

Once the shell hangs once, no db.fsyncUnlock() will succeed - they will all cause the shell to hang, and nothing untoward will be printed in the logs (even at verbosity 5)



 Comments   
Comment by Eliot Horowitz (Inactive) [ 30/Apr/13 ]

See SERVER-6302,which is being worked on now.

Comment by Gabriel Jones [ 30/Apr/13 ]

I found this dubious workaround:
If the shell is hung. Open a second shell and issue another Unlock command. If that does not unhang the first shell, [Ctrl][C] the second shell and reopen and unlock again. Repeat until the first shell unhangs. This will unhang the first shell but now any fsyncLock and subsequent fsyncUnlock commands will still hang. It can always be unhung by repeatedly issuing the fsyncUnlock command from a second shell.

Even though this will unhang the shell it does not inspire confidence that the mongo server is actually in a valid state.

Comment by Gabriel Jones [ 29/Apr/13 ]

Probably related to the fix for https://jira.mongodb.org/browse/SERVER-2789

Comment by Gabriel Jones [ 29/Apr/13 ]

I just encountered this in 2.4.2

Pressing [Ctrl][C] results in

JavaScript execution failed: Error: field not found, expected type 2 at src/mongo/shell/query.js:L78

Looks like this fellow also ran into this problem:
http://grokbase.com/t/gg/mongodb-user/133n166get/how-long-is-db-fsyncunlock-supposed-to-take-or-what-to-do-when-it-it-does-not-complete

Generated at Thu Feb 08 03:20:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.