[SERVER-42312] Validate during rollback can cause count mismatch Created: 22/Jul/19  Updated: 29/Oct/23  Resolved: 24/Sep/19

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 4.2.1, 4.3.1, 4.0.14

Type: Bug Priority: Major - P3
Reporter: Judah Schvimer Assignee: Pavithra Vetriselvan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-43843 Skip validate during the rollback fuz... Closed
related to SERVER-43972 initial_sync_capped_index.js should c... Closed
related to SERVER-52976 [4.2] collection_validation.cpp isn't... Closed
is related to SERVER-34976 clear the "needing size adjustment" s... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2, v4.0
Sprint: Repl 2019-08-26, Repl 2019-09-09, Repl 2019-09-23, Repl 2019-10-07
Participants:
Linked BF Score: 16

 Description   

When validate is run, it marks a collection as always needing size adjustment. This causes rollback to assume that oplog application will correct the record store sizes, even though it won't if we have rolled back any inserts or deletes that change the size (and don't account for those).

If validate is called after we call RTT, then it should be marked for size adjustment, and if it is called before RTT it shouldn't be. We clear the "mark for size adjustment" state before each rollback, so it can really only be relevant to validates that happen during the rollback. I think we shouldn't allow validate while we're rolling back, and we should not mark collections for size adjustment on validate since it's unnecessary.



 Comments   
Comment by Githook User [ 11/Oct/21 ]

Author:

{'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}

Message: SERVER-52976 SERVER-42312 disallow validate cmd during rollback and recovering states

(cherry picked from commit 1a3936b3ee365de5dde80e440c01fa6e868a1a54)
(cherry picked from commit 82e34b0d86c6625b5a67d1909e1dcc434f78be82)
Branch: v4.2
https://github.com/mongodb/mongo/commit/3ac74aad2a2b70ad5b1ccb97b07a92c52a51f2f3

Comment by Githook User [ 18/Oct/19 ]

Author:

{'username': 'pvselvan', 'email': 'pavithra.vetriselvan@mongodb.com', 'name': 'Pavithra Vetriselvan'}

Message: SERVER-42312 disallow validate cmd during rollback and recovering states

(cherry picked from commit 1a3936b3ee365de5dde80e440c01fa6e868a1a54)
Branch: v4.0
https://github.com/mongodb/mongo/commit/82e34b0d86c6625b5a67d1909e1dcc434f78be82

Comment by Githook User [ 18/Oct/19 ]

Author:

{'name': 'Pavithra Vetriselvan', 'username': 'pvselvan', 'email': 'pavithra.vetriselvan@mongodb.com'}

Message: SERVER-42312 disallow validate cmd during rollback and recovering states

(cherry picked from commit 1a3936b3ee365de5dde80e440c01fa6e868a1a54)
Branch: v4.2
https://github.com/mongodb/mongo/commit/186079301dc9de56313f5a8e84e6088fec289ded

Comment by Githook User [ 24/Sep/19 ]

Author:

{'username': 'pvselvan', 'email': 'pavithra.vetriselvan@mongodb.com', 'name': 'Pavithra Vetriselvan'}

Message: SERVER-42312 disallow validate cmd during rollback and recovering states
Branch: master
https://github.com/mongodb/mongo/commit/1a3936b3ee365de5dde80e440c01fa6e868a1a54

Comment by Judah Schvimer [ 23/Jul/19 ]

With RTT the answer is a bit complicated. If you use "recoverFromOplogAsStandalone", then you'll validate either the old branch of history or the new branch of history, but not an "in between" state. Without that flag you'll validate an arbitrary point in time in the past that was consistent at that point in time, relative to the oplog (not reflecting any prepared transactions). Both of these should appear consistent, and I would expect that fastcount should remain correct since we don't have any "marking for size adjustment" quirks outside of rollback occurring.

We should test that this standalone workaround works.

Comment by Bruce Lucas (Inactive) [ 23/Jul/19 ]

judah.schvimer, yes, that was my question, only indirectly related to this ticket.

Comment by Judah Schvimer [ 22/Jul/19 ]

bruce.lucas, do you mean can validate return false indications of inconsistency if you shut down a node in "rollback" or "recovering" state and restart it as a standalone and then call validate?

Comment by Bruce Lucas (Inactive) [ 22/Jul/19 ]

I don't see any problem with that, provided it is in fact possible to do validate when you start up standalone in recovering state. That raises one question though: can validate return any false indications of inconsistency in that state?

cc: kelsey.schubert, daniel.hatcher.

Comment by Judah Schvimer [ 22/Jul/19 ]

I looked into this with max.hirschhorn, the easiest and most canonical fix would be that the validate command should set maintenanceOk to false, the default is true. This would also mean that nodes in "Recovering" could not run validate without starting up as a standalone. If you can't do reads during "Recovering" state, it makes sense to not be able to run validate, which is inherently a read. alyson.cabral and bruce.lucas, does this seem reasonable and do you think it's fine to start erroring when running validate while in "Recovering" and "Rollback"?

Comment by Judah Schvimer [ 22/Jul/19 ]

geert.bosch and max.hirschhorn, what do you think about making validate return a NotMasterOrSecondary error while in rollback?

And Geert and benety.goh (I think you touched this code in storage two phase drops), do you agree that marking collections for size adjustment in validate is wrong?

Generated at Thu Feb 08 05:00:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.