[SERVER-35085] repair can cause spurious NamespaceNotFound errors with concurrent initial sync operations Created: 18/May/18 Updated: 29/Oct/23 Resolved: 14/Jul/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.1, 4.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v4.0
|
||||||||
| Sprint: | Storage NYC 2018-07-16 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 68 | ||||||||
| Description |
|
Replication ignores certain NamespaceNotFound errors for idempotency reasons, but this bug makes those assumptions unsafe. |
| Comments |
| Comment by Githook User [ 20/Jul/18 ] |
|
Author: {'username': 'DiannaHohensee', 'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@10gen.com'}Message: Added RAII object around UUIDCatalog::onCloseCatalog and UUIDCatalog::onOpenCatalog (cherry picked from commit 9184a03574c398b087b929fda8ed428f0c64d28c) |
| Comment by Githook User [ 14/Jul/18 ] |
|
Author: {'username': 'DiannaHohensee', 'name': 'Dianna Hohensee', 'email': 'dianna.hohensee@10gen.com'}Message: Added RAII object around UUIDCatalog::onCloseCatalog and UUIDCatalog::onOpenCatalog |
| Comment by Dianna Hohensee (Inactive) [ 26/Jun/18 ] |
|
The repairDatabase command can run concurrently with initial sync or other replication activities. Replication will use the UUID catalog to look up a collection that is currently being repaired and as such can be temporarily absent. repairDatabase does take a heavy duty global X lock, but accesses to the UUID catalog do not require locks – we only lock namespaces. A proposed solution is to have repair delay visibility of its UUID changes until it's done. This can be accomplished by using the UUIDCatalog's onCloseCatalog and onOpenCatalog functionality. This will maintain a view of the catalog at the time of onCloseCatalog, so repl operations can continue to see the old view, until onOpenCatalog is called to establish the repaired view. repairDatabase already takes the global X lock, but the UUIDCatalog is outside of locks, so repl operations will see all the collections and then block waiting for repairDatabase to finish. Probably some refactoring and a RAII object for closing and ensuring reopen of the catalog. |