[SERVER-37384] Find by UUID must not return NamespaceNotFound without taking any database lock Created: 28/Sep/18  Updated: 29/Oct/23  Resolved: 13/Nov/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 4.0.2, 4.1.3
Fix Version/s: 4.1.6

Type: Bug Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Eric Milkie
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-34615 find by UUID can return NamespaceNotF... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Storage NYC 2018-11-05, Storage NYC 2018-11-19
Participants:
Linked BF Score: 0

 Description   

A find by UUID can return NamespaceNotFound without taking any database lock. This allows a reader to see intermediate state from an operation that has the database locked in X mode and is performing catalog operations that may be rolled back. This is problematic for the rollbackViaRefetch algorithm, where a NamespaceNotFound error is treated as a promise that the collection will be dropped in a future oplog entry. This can lead to inconsistent data between replica set members.



 Comments   
Comment by Githook User [ 13/Nov/18 ]

Author:

{'name': 'Eric Milkie', 'email': 'milkie@10gen.com', 'username': 'milkie'}

Message: SERVER-37384 lock db prior to examining UUID catalog, to avoid phantom NamespaceNotFound errors
Branch: master
https://github.com/mongodb/mongo/commit/f69f5a743962b6350d5830db0d03aaa4f815acf7

Comment by Tess Avitabile (Inactive) [ 02/Nov/18 ]

Yes, I believe that would solve the problem in rollbackViaRefetch. We require that a find not be able to see any uncommitted catalog changes, so you should only return NamespaceNotFound if you have a database lock.

Comment by Kaloian Manassiev [ 02/Nov/18 ]

Actually at the time when SERVER-32367 was implemented, the lockless resolution had to happen before the lock is acquired, because the semantics of collection UUIDs was that they are global and we wouldn't know exactly which database to lock.

However I believe since then, we have gone back on the UUID semantics and now they need to be qualified with a database name, so we could lock the database first and then do the UUID -> NSS resolution in AutoGetCollection. geert.bosch?

As far as auth is concerned, that we cannot fix, because like Eric says taking locks in the auth system opens up opportunities for DoS attack by unauthorized users.

But in the case of rollbackViaRefetch, you should already be running at internal authorization and not go through the regular auth code, so that shouldn't matter. tess.avitabile, if we switched the order of locking/UUID resolution, would that solve the repl problem?

Comment by Eric Milkie [ 01/Nov/18 ]

tess.avitabile the link in the description you provided to the code that resolves UUIDs without locking can return NamespaceNotFound, but I believe prior to this point there is an auth check that does the same thing. In Kal's code review here: https://mongodbcr.appspot.com/187170002 there was some discussion for why we want to avoid locking to do auth.
The first way I thought to fix this would be to lock the database and redo the UUID resolution, but that would mean that every find on a nonexistent collection would lock the database at auth time. Based on the discussion in the code review, I'm guessing this isn't something we're willing to pay. I'm open to other suggestions on how to fix this issue. kaloian.manassiev

Comment by Tess Avitabile (Inactive) [ 28/Sep/18 ]

That is a similar issue, but it did not fix the problem that you can see catalog changes that occur in a WUOW that can get rolled back.

Comment by William Schultz (Inactive) [ 28/Sep/18 ]

tess.avitabile Possibly related to SERVER-34615?

Generated at Thu Feb 08 04:45:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.