[SERVER-49527] recoverFromOplogAsStandalone does not relax index constraints Created: 15/Jul/20  Updated: 29/Oct/23  Resolved: 27/Jul/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Replication
Affects Version/s: None
Fix Version/s: 4.0.20, 4.2.9

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
is related to SERVER-49529 setTableLogging error at startup whil... Closed
is related to SERVER-49530 fix final index build phase for oplog... Closed
is related to SERVER-49924 Forward-port SERVER-49527 to master b... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Sprint: Repl 2020-07-27, Repl 2020-08-10
Participants:

 Description   

Normal replication recovery does, because shouldRelaxIndexConstraints returns "true" during STARTUP but shouldRelaxIndexConstraints returns "false" on a standalone, so recoverFromOplogAsStandalone does not.



 Comments   
Comment by Githook User [ 24/Aug/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-49924 SERVER-49527 recoverFromOplogAsStandalone does not relax index constraints

(cherry picked from commit e0cde43310e5dab3fcf6e93bb115259e70a165e8)
(cherry picked from commit b855c38dc4eaba412df8c195c6e00a569cf47d2a)
Branch: master
https://github.com/mongodb/mongo/commit/ca39dac5d6a8316787141d13bcd05261042f6540

Comment by Githook User [ 20/Aug/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-49527-44
Branch: v4.4
https://github.com/mongodb/mongo/commit/10de031cb01b10c0b87addaf15dd23a094ad286c

Comment by Githook User [ 27/Jul/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-49527 recoverFromOplogAsStandalone does not relax index constraints

(cherry picked from commit e0cde43310e5dab3fcf6e93bb115259e70a165e8)
Branch: v4.2
https://github.com/mongodb/mongo/commit/b855c38dc4eaba412df8c195c6e00a569cf47d2a

Comment by Daniel Gottlieb (Inactive) [ 27/Jul/20 ]

That plan sounds good to me. I'm in favor of being lenient when the only problematic code paths (for external user queries) are only possible during startup recovery (before the server starts listening to a port).

Comment by Judah Schvimer [ 22/Jul/20 ]

Upon trying to change canAcceptWritesFor, I realized that I would also have to change checkCanServeReadsFor and commandCanRunHere at least. In testing, I learned that the server will not accept reads or writes during oplog replay because it's during initAndListen, so none of the user command concerns are actual problems. To reduce risk and since I don't see any actual bugs, I'm going to stick with the current patch and forward-port it as far as I can. daniel.gottlieb, does that sound alright to you?

Comment by Judah Schvimer [ 17/Jul/20 ]

At daniel.gottlieb's suggestion, I audited places where we use canAcceptWritesFor* to see if there are other places we should be checking for recoverFromOplogAsStandalone. Here are my findings of what canAcceptWritesFor should return when running with recoverFromOplogAsStandalone in each code location where we call canAcceptWritesFor on v4.0. A lot of these places also check writesAreReplicated which surprisingly returns “true” on standalones:

  1. cloner: should return false
  2. commandCanRunHere: should return false
  3. db_raii: checks if we should read at lastApplied, doesn’t really matter for rFOAS since it’s readOnly and data won't change. We shouldn't allow reads (or writes) though until after we recover from the oplog.
  4. linearizable reads: replica set only
  5. service entry point: should return false, but only really used for replica sets
  6. session reaper: should return false
  7. sessions_collection_rs::runIfStandaloneOrPrimary: should return false
  8. system_index::generateSystemIndexForExistingCollection: should return false (though would be allowed because of standalone check)
  9. TTL: should return false
  10. empty capped: should return false
  11. convertToCapped: should return false
  12. collMod: should return false
  13. create: should return false
  14. DatabaseImpl::createCollection: should return false (though would be allowed because of standalone check)
  15. dropCollection: should return false
  16. dropDatabase: should return false
  17. dropIndexes: should return false
  18. index creation: should return false
  19. renameCollection: should return false
  20. cloneCollectionAsCapped: should return false
  21. createIndex: should return false
  22. dbcheck: replica sets only
  23. findAndModify: should return false
  24. mr: should return false
  25. oplog note: should return false
  26. delete: should return false
  27. exec/update: should return false
  28. free monitoring: should return false
  29. ops/update: should return false
  30. write_ops_exec: should return false
  31. shouldWaitForOplogVisibility: replica sets only
  32. getExecutorDelete: should return false
  33. getExecutorUpdate: should return false
  34. applyOps: should return false
  35. doTxn: should return false
  36. noop writer: should return false
  37. oplog:shouldBuildInForeground: replica sets only
  38. _logOpsInner: replica sets only
  39. checkCanServeReadsFor: We should be able to accept reads in rFOAS. If canAcceptWritesFor returned false, our ability to read would depend on the slaveOk parameter, which is populated by read preference. I’m not sure how this is used on standalones, but I think we shouldn’t rely on it.
  40. shouldRelaxIndexConstraints: should return false
  41. migration destination manager: should return false
  42. migration source manager: should return false
  43. move primary: should return false
  44. set shard version: should return false

The above indicates to me that the rFOAS check should move down into canAcceptWritesFor rather than being in shouldRelaxIndexConstraints. Additionally, above bolded are the places that are concerning where rFOAS won’t necessarily do the correct thing. Being in “read-only” mode protects us in general in most of these cases, though we’re not in read-only mode until after the oplog recovery plays, and I don’t think we do anything to prevent reads or writes during oplog recovery.

All of the above said, I think it is unlikely any of the above concerns will cause backup-restore problems. They appear to be confined to user operations, rather than internal ones, and backup-restore does not do any user operations while in rFOAS. checkCanServeReadsFor is the one I'd be most concerned about, but that also appears to only be used in user operations.

Comment by Githook User [ 15/Jul/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-49527 recoverFromOplogAsStandalone does not relax index constraints
Branch: v4.0
https://github.com/mongodb/mongo/commit/e0cde43310e5dab3fcf6e93bb115259e70a165e8

Comment by Judah Schvimer [ 15/Jul/20 ]

I reproduced this in a much simpler test. I tried running it against master, but I hit SERVER-49530. In the meantime. I'll make a fix on 4.0 and then consider forward porting it.

Generated at Thu Feb 08 05:20:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.