[SERVER-35085] repair can cause spurious NamespaceNotFound errors with concurrent initial sync operations Created: 18/May/18  Updated: 29/Oct/23  Resolved: 14/Jul/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.0.1, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Storage NYC 2018-07-16
Participants:
Linked BF Score: 68

 Description   

Replication ignores certain NamespaceNotFound errors for idempotency reasons, but this bug makes those assumptions unsafe.



 Comments   
Comment by Githook User [ 20/Jul/18 ]

Author:

{'username': 'DiannaHohensee', 'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@10gen.com'}

Message: SERVER-35085 Hide the visibility of UUIDCatalog changes during repairDatabase cmd

Added RAII object around UUIDCatalog::onCloseCatalog and UUIDCatalog::onOpenCatalog

(cherry picked from commit 9184a03574c398b087b929fda8ed428f0c64d28c)
Branch: v4.0
https://github.com/mongodb/mongo/commit/ccfb33a4cdfb86a53982a48fdbfa826477b6c035

Comment by Githook User [ 14/Jul/18 ]

Author:

{'username': 'DiannaHohensee', 'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@10gen.com'}

Message: SERVER-35085 Hide the visibility of UUIDCatalog changes during repairDatabase cmd

Added RAII object around UUIDCatalog::onCloseCatalog and UUIDCatalog::onOpenCatalog
Branch: master
https://github.com/mongodb/mongo/commit/9184a03574c398b087b929fda8ed428f0c64d28c

Comment by Dianna Hohensee (Inactive) [ 26/Jun/18 ]

The repairDatabase command can run concurrently with initial sync or other replication activities. Replication will use the UUID catalog to look up a collection that is currently being repaired and as such can be temporarily absent. repairDatabase does take a heavy duty global X lock, but accesses to the UUID catalog do not require locks – we only lock namespaces.

A proposed solution is to have repair delay visibility of its UUID changes until it's done. This can be accomplished by using the UUIDCatalog's onCloseCatalog and onOpenCatalog functionality. This will maintain a view of the catalog at the time of onCloseCatalog, so repl operations can continue to see the old view, until onOpenCatalog is called to establish the repaired view. repairDatabase already takes the global X lock, but the UUIDCatalog is outside of locks, so repl operations will see all the collections and then block waiting for repairDatabase to finish.

Probably some refactoring and a RAII object for closing and ensuring reopen of the catalog.

Generated at Thu Feb 08 04:38:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.