[SERVER-33639] Concurrent writes against non-existent database can fail due to distlock acquisition timeout at `createDatabase` time Created: 02/Mar/18 Updated: 29/Oct/23 Resolved: 23/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.6.3, 3.7.2 |
| Fix Version/s: | 3.6.6, 4.0.0-rc1, 4.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Janna Golden |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v4.0
|
||||||||||||||||||||
| Sprint: | Sharding 2018-05-21, Sharding 2018-06-04 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||
| Description |
|
Starting with MongoDB 3.6.0, the creation of sharded databases was made explicit from the point of view of MongoS and the creation logic was moved to the config server. Since the default distributed lock acquisition timeout is still 20 seconds, this causes timeouts when large number of threads suddenly try to write against a database, which does not exist. What happens is a convoying effect on the -movePrimary distributed lock, which times out and fails writes even though the database is already created. I am able to reproduce this problem 100% using the load phase of the YCSB benchmark with 40 threads. In order to avoid this effect, before taking the distributed lock, we should take some form of lock manager X lock, like with the other metadata commands after which we should check the database for existence before taking the distributed lock, in order to mitigate the convoying effect. |
| Comments |
| Comment by Githook User [ 25/May/18 ] |
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: (cherry picked from commit c1cc37fa0958963427000ec1ac2368efe2ea8177) |
| Comment by Githook User [ 25/May/18 ] |
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: (cherry picked from commit c1cc37fa0958963427000ec1ac2368efe2ea8177) |
| Comment by Githook User [ 25/May/18 ] |
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: |
| Comment by Githook User [ 23/May/18 ] |
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: |
| Comment by Githook User [ 23/May/18 ] |
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: (cherry picked from commit 8ba9e8eb48d948c082ff5cb85dc059322f5ea5cb) |
| Comment by Janna Golden [ 23/May/18 ] |
|
A different ticket was committed with this ticket number, removed that commit comment from this ticket. |
| Comment by Githook User [ 23/May/18 ] |
|
Author: {'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}Message: |