[SERVER-41214] Query yield recovery does not check for collection rename under the correct locks Created: 17/May/19  Updated: 29/Oct/23  Resolved: 26/Jun/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.2.0-rc3, 4.3.1

Type: Bug Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Query 2019-06-17, Query 2019-07-01
Participants:
Linked BF Score: 59

 Description   

Queries periodically yield and recover their intent locks using Locker::LockSnapshot. After restoring the lock snapshot, PlanExecutor::restoreState() is called in order to make several validity checks. As part of this yield recovery process, we make a check for collection rename and throw an exception if the Collection object's namespace string has changed during yield.

It is possible that this check is made without holding the correct lock on the Collection object. The resource IDs which we lock are collection names, not collection UUIDs. Therefore, if a collection is renamed, the identity of the lock that must be held to protect the Collection object also changes. Consider the case of a collection being renamed from A to B during yield. When the lock snapshot is restored, the query execution subsystem will continue to hold the lock on A, not B. This means that the call to Collection::ns() may not be correctly synchronized with other threads executing drops or renames on that collection.

Credit to geert.bosch for helping me discover this issue and think through a fix. The easiest fix would be to use CollectionCatalog::lookupNSSByUUID() instead of CollectionCatalog::lookupCollectionByUUID() in order to safely identify whether a rename has occurred. If the namespace string has not changed, then you know that the correct locks were restored from the lock snapshot and query execution can proceed. Otherwise, the query will be killed.

Additional work will be required if we ever want to allow queries to survive renames. In particular, we may want to change query yielding so that it doesn't use the Locker::LockSnapshot interface, but rather uses the db_raii.h helpers directly on yield recovery.



 Comments   
Comment by Githook User [ 26/Jun/19 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@10gen.com'}

Message: SERVER-41214 Fix data race in restoring PlanStages which require a collection.

(cherry picked from commit 7717e2f424e3bdf1e70bb21f25f39f5a2081bf26)
Branch: v4.2
https://github.com/mongodb/mongo/commit/06ccd790b6216e0eec525a40e470949213c3df4e

Comment by Githook User [ 26/Jun/19 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@10gen.com', 'username': 'dstorch'}

Message: SERVER-41214 Fix data race in restoring PlanStages which require a collection.
Branch: master
https://github.com/mongodb/mongo/commit/7717e2f424e3bdf1e70bb21f25f39f5a2081bf26

Comment by Githook User [ 26/Jun/19 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@10gen.com', 'username': 'dstorch'}

Message: SERVER-41214 Modernize COLLSCAN dbtest.
Branch: master
https://github.com/mongodb/mongo/commit/87b06cc52bd69b40a201e36bdec1e3e912c42277

Comment by Geert Bosch [ 25/Jun/19 ]

BTW, this would be great:  "change query yielding so that it doesn't use the Locker::LockSnapshot interface, but rather uses the db_raii.h helpers directly on yield recovery".

Generated at Thu Feb 08 04:57:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.