[SERVER-34211] A failed restartCatalog command can clear the cached repl oplog pointer without reestablishing it Created: 30/Mar/18 Updated: 29/Oct/23 Resolved: 03/May/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication, Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kyle Suarez | Assignee: | Kyle Suarez |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
Imagine this sequence of events:
One solution would be to add a ScopeGuard to restartCatalog that calls repl::acquireOplogCollectionForLogging() if the call to catalog::closeCatalog() fails for any reason. |
| Comments |
| Comment by Githook User [ 03/May/18 ] |
|
Author: {'email': 'kyle.suarez@mongodb.com', 'name': 'Kyle Suarez', 'username': 'ksuarz'}Message: |
| Comment by Kyle Suarez [ 02/Apr/18 ] |
|
Thanks Andy, that sounds like a solid approach. |
| Comment by Andy Schwerin [ 02/Apr/18 ] |
|
I think you should suppress lock acquisition interruption using the recently introduced guard type. As for deadlock, it may not be an issue if you hold the global lock in MODE_X. Just keep an eye on it. |
| Comment by Kyle Suarez [ 02/Apr/18 ] |
|
While the command is executing, the global lock is held in exclusive mode. Does that still leave open the possibility for lock acquisition to throw? To prevent deadlock, we could do the locking ourselves (rather than calling {{repl::acquireOplogCollectionForLogging(), which uses one of the AutoGet* helpers) given that we know we're exclusively locked. |
| Comment by Andy Schwerin [ 01/Apr/18 ] |
|
The risk with the proposed solution is that acquiring locks can throw or deadlock. |
| Comment by Kyle Suarez [ 30/Mar/18 ] |
|
Note that a successful restartCatalog command will re-establish the cached oplog collection pointer in catalog::openCatalog(). |