[SERVER-18504] "Lock for createDatabase is taken" triggered by mongorestore restoring multiple collections in parallel Created: 16/May/15 Updated: 19/Sep/15 Resolved: 19/May/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding, Tools |
| Affects Version/s: | None |
| Fix Version/s: | 3.1.4 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Michael O'Brien | Assignee: | Daniel Alabi |
| Resolution: | Done | Votes: | 0 |
| Labels: | UT | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Participants: | |||||||||
| Description |
|
max.hirschhorn@10gen.com located this in a patch build: it is a consequence of mongorestore's parallel restore triggering multiple collections being created at once, each needing to get the database lock on the server side but only one. |
| Comments |
| Comment by Daniel Alabi [ 19/May/15 ] |
|
A more permanent fix is for mongorestore to retry appropriately when there's a timeout resulting from a distributed lock contention on database creation. A TOOLS ticket will be created to track this issue. |
| Comment by Githook User [ 19/May/15 ] |
|
Author: {u'username': u'alabid', u'name': u'Daniel Alabi', u'email': u'alabidan@gmail.com'}Message: |
| Comment by Andy Schwerin [ 18/May/15 ] |
|
While the tools should be prepared for this to fail, kaloian.manassiev pointed out that we could retry for more than 1 second to acquire the needed distributed lock. In practice, that would probably eliminate the problem. Assigning to alabid to turn up the timeout |
| Comment by Kaloian Manassiev [ 18/May/15 ] |
|
Regarding increasing the distributed lock timeout:- We could increase it by a little bit, but it is always possible that many threads creating database or collection at the same time will end up causing contention and the lock will timeout. So it would be best if the tools are expecting this and retry appropriately. |
| Comment by Max Hirschhorn [ 16/May/15 ] |
|
Yes, it is a sharding-only behavior. Had chatted about it with kaloian.manassiev and it'll happen when the second client times out on acquiring the distributed lock to create the database. |
| Comment by Andy Schwerin [ 16/May/15 ] |
|
This is a sharding-only behavior, right? Just clarifying. |