[SERVER-49527] recoverFromOplogAsStandalone does not relax index constraints Created: 15/Jul/20 Updated: 29/Oct/23 Resolved: 27/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.0.20, 4.2.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Judah Schvimer |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||||||||||||||
| Sprint: | Repl 2020-07-27, Repl 2020-08-10 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Description |
|
Normal replication recovery does, because shouldRelaxIndexConstraints returns "true" during STARTUP but shouldRelaxIndexConstraints returns "false" on a standalone, so recoverFromOplogAsStandalone does not. |
| Comments |
| Comment by Githook User [ 24/Aug/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: (cherry picked from commit e0cde43310e5dab3fcf6e93bb115259e70a165e8) |
| Comment by Githook User [ 20/Aug/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: |
| Comment by Githook User [ 27/Jul/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: (cherry picked from commit e0cde43310e5dab3fcf6e93bb115259e70a165e8) |
| Comment by Daniel Gottlieb (Inactive) [ 27/Jul/20 ] |
|
That plan sounds good to me. I'm in favor of being lenient when the only problematic code paths (for external user queries) are only possible during startup recovery (before the server starts listening to a port). |
| Comment by Judah Schvimer [ 22/Jul/20 ] |
|
Upon trying to change canAcceptWritesFor, I realized that I would also have to change checkCanServeReadsFor and commandCanRunHere at least. In testing, I learned that the server will not accept reads or writes during oplog replay because it's during initAndListen, so none of the user command concerns are actual problems. To reduce risk and since I don't see any actual bugs, I'm going to stick with the current patch and forward-port it as far as I can. daniel.gottlieb, does that sound alright to you? |
| Comment by Judah Schvimer [ 17/Jul/20 ] |
|
At daniel.gottlieb's suggestion, I audited places where we use canAcceptWritesFor* to see if there are other places we should be checking for recoverFromOplogAsStandalone. Here are my findings of what canAcceptWritesFor should return when running with recoverFromOplogAsStandalone in each code location where we call canAcceptWritesFor on v4.0. A lot of these places also check writesAreReplicated which surprisingly returns “true” on standalones:
The above indicates to me that the rFOAS check should move down into canAcceptWritesFor rather than being in shouldRelaxIndexConstraints. Additionally, above bolded are the places that are concerning where rFOAS won’t necessarily do the correct thing. Being in “read-only” mode protects us in general in most of these cases, though we’re not in read-only mode until after the oplog recovery plays, and I don’t think we do anything to prevent reads or writes during oplog recovery. All of the above said, I think it is unlikely any of the above concerns will cause backup-restore problems. They appear to be confined to user operations, rather than internal ones, and backup-restore does not do any user operations while in rFOAS. checkCanServeReadsFor is the one I'd be most concerned about, but that also appears to only be used in user operations. |
| Comment by Githook User [ 15/Jul/20 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: |
| Comment by Judah Schvimer [ 15/Jul/20 ] |
|
I reproduced this in a much simpler test. I tried running it against master, but I hit |