[SERVER-39167] Concurrent insert into nonexistent database often fails with "database not found" on 4.1.7 Created: 24/Jan/19 Updated: 06/Dec/22 Resolved: 13/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 4.1.7 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Shane Harvey | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Sharding 2019-02-25 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Found by a python test in Concurrently inserting into a collection when the database does not exist often fails on mongos 4.1.7 with "unable to initialize targeter for write op for collection auth_test_46.test :: caused by :: Database auth_test_46 not found :: caused by :: database auth_test_46 not found" Here's a simple python repro: repro-1725.py
The failing command and response is:
Edit: This is reproducible with and without auth enabled. |
| Comments |
| Comment by Kaloian Manassiev [ 13/May/19 ] |
|
This is the same problem as movePrimary and drop and re-create database and is due to CRUD operations not doing database versioning, which we consciously decided not to do since it won't be necessary when we start tracking unsharded collections in the catalog. Closing it as duplicate of |
| Comment by Shane Harvey [ 25/Feb/19 ] |
|
Looking at this again, I don't think this behavior just started in 4.1.7. I can reproduce the error (using the attached repro) on 4.1.6 and on all versions of 4.0 but not on 3.6.10. |
| Comment by Kaloian Manassiev [ 25/Feb/19 ] |
|
janna.golden, Shane mentions that this bug started showing up after 4.1.7, but we have had the issues in the "Unify the Sharding Caching Behaviour" project for a while before that. Something must have changed in 4.2. shane.harvey, can you confirm that these tests are not new and they work fine with older versions. |
| Comment by Janna Golden [ 25/Feb/19 ] |
|
When creating a database, we first refresh the catalog cache and check for an existing db entry. . If we do not find one, we create the entry and then refresh the catalog cache again. If there are concurrent createDatabase commands for the same db, we can run into a scenario where the first thread does the first refresh, doesn't find an entry, and then inserts an entry for the db into the catalog cache. A second thread can start the first refresh before this first thread is done inserting the entry into the catalog cache. Then, the first thread will do the second refresh after it finishes inserting the entry, and join the second thread's first refresh. Since this refresh started before the first thread finished inserting into the catalog cache, we will not find an entry in the catalog and will fail with db not found. I will add this ticket to the "Unify the Sharding Caching Behaviour" project. |