[SERVER-33639] Concurrent writes against non-existent database can fail due to distlock acquisition timeout at `createDatabase` time Created: 02/Mar/18  Updated: 29/Oct/23  Resolved: 23/May/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.6.3, 3.7.2
Fix Version/s: 3.6.6, 4.0.0-rc1, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Janna Golden
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-33934 Creating an unsharded collection can ... Closed
is related to SERVER-35226 now that createCollection and createD... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0
Sprint: Sharding 2018-05-21, Sharding 2018-06-04
Participants:
Linked BF Score: 0

 Description   

Starting with MongoDB 3.6.0, the creation of sharded databases was made explicit from the point of view of MongoS and the creation logic was moved to the config server. Since the default distributed lock acquisition timeout is still 20 seconds, this causes timeouts when large number of threads suddenly try to write against a database, which does not exist.

What happens is a convoying effect on the -movePrimary distributed lock, which times out and fails writes even though the database is already created. I am able to reproduce this problem 100% using the load phase of the YCSB benchmark with 40 threads.

In order to avoid this effect, before taking the distributed lock, we should take some form of lock manager X lock, like with the other metadata commands after which we should check the database for existence before taking the distributed lock, in order to mitigate the convoying effect.



 Comments   
Comment by Githook User [ 25/May/18 ]

Author:

{'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}

Message: SERVER-33639 Protect against distlock acquisition timeout at createDatabase time

(cherry picked from commit c1cc37fa0958963427000ec1ac2368efe2ea8177)
Branch: v3.6
https://github.com/mongodb/mongo/commit/1a58ef4331ead1add54710e5c9a1f5b117706c2f

Comment by Githook User [ 25/May/18 ]

Author:

{'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}

Message: SERVER-33639 Protect against distlock acquisition timeout at createDatabase time

(cherry picked from commit c1cc37fa0958963427000ec1ac2368efe2ea8177)
Branch: v4.0
https://github.com/mongodb/mongo/commit/1d296f2e5bf33274e52be0b0bbb823e07ad6b8dd

Comment by Githook User [ 25/May/18 ]

Author:

{'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}

Message: SERVER-33639 Protect against distlock acquisition timeout at createDatabase time
Branch: master
https://github.com/mongodb/mongo/commit/c1cc37fa0958963427000ec1ac2368efe2ea8177

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}

Message: SERVER-33639 Protect against distlock acquisition timeout at createDatabase time
Branch: v3.6
https://github.com/mongodb/mongo/commit/c8497a2c85e65680b439603caba6874b54082355

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}

Message: SERVER-33639 Protect against distlock acquisition timeout at createDatabase time

(cherry picked from commit 8ba9e8eb48d948c082ff5cb85dc059322f5ea5cb)
Branch: v4.0
https://github.com/mongodb/mongo/commit/fea3de5fb3fff05c2051e6c8ac8d04d5359b922b

Comment by Janna Golden [ 23/May/18 ]

A different ticket was committed with this ticket number, removed that commit comment from this ticket.

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'jannaerin', 'name': 'jannaerin', 'email': 'golden.janna@gmail.com'}

Message: SERVER-33639 Protect against distlock acquisition timeout at createDatabase time
Branch: master
https://github.com/mongodb/mongo/commit/8ba9e8eb48d948c082ff5cb85dc059322f5ea5cb

Generated at Thu Feb 08 04:34:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.