[SERVER-48083] Support rolling back a collMod oplog entry that modifies an index when using rollback via refetch Created: 11/May/20  Updated: 22/Jun/20  Resolved: 22/Jun/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Mihai Andrei Assignee: Mihai Andrei
Resolution: Won't Fix Votes: 0
Labels: qexec-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by DOCS-13722 Investigate changes in SERVER-48083: ... Closed
Sprint: Query 2020-06-29
Participants:

 Description   

Currently, attempting to roll back a collMod oplog entry that targets an ‘index’ field will trigger an fassert during rollback via refetch. This is true for collMod commands that modify an index’s hidden field as well as those which modify an index’s TTL option. Since changes to an index’s TTL option are not considered eligible for rollback via refetch, neither are changes to an index’s hidden option considered eligible at present.



 Comments   
Comment by Mihai Andrei [ 22/Jun/20 ]

Filed DOCS-13722 to track work to update documentation for 'enableMajorityReadConcern'.

Comment by Mihai Andrei [ 16/Jun/20 ]

That sounds reasonable; I think the place to note this would be within the documentation for enableMajorityReadConcern. If it's alright with everyone, I'll close this ticket as won't fix and file a docs ticket to update the linked section in the docs.

Comment by Daniel Gottlieb (Inactive) [ 16/Jun/20 ]

If we choose to go ahead with advertising hidden indexes in our 4.4 docs, we should add a note to our docs that replication rollback fails to undo the collMod operation and that the affected nodes can only recover with a resync.

Comment by Mihai Andrei [ 16/Jun/20 ]

daniel.gottlieb Yes, hidden indexes are documented for 4.4, per DOCSP-8660. As to whether there is a workaround for this that doesn’t involve resyncing, I’m not sure.

Comment by Daniel Gottlieb (Inactive) [ 16/Jun/20 ]

Are hidden indexes documented for 4.4?

If a user hits the fassert trying to rollback the operation, is there a workaround to getting the node back into the replica set without requiring a resync?

Comment by Mihai Andrei [ 16/Jun/20 ]

william.schultz With regards to your interpretation of how hidden indexes is used, I think your intuition is correct: a user might configure an index's hidden field during some time window, after which they keep it hidden or disable it indefinitely.

Unless anyone feels strongly about doing this work or it comes to pass that eMRC=false won't be removed in 4.6, I'm going to close this ticket as 'Won't fix' .

Comment by William Schultz (Inactive) [ 11/Jun/20 ]

mihai.andrei With regards to implementation, it turns out we actually added initial support for making collMod oplog entries "reversible" in SERVER-28205 (e.g. see the "collectionOptions_old" field). We never utilized the extra information in rollbackViaRefetch, though. This would probably make it easier to add support for rolling back collMod oplog entries that change collection options, though. Interestingly, it looks like we also added the value of the "old" hidden field to the collMod oplog entry as a part of SERVER-9306, which might further ease the implementation.

To address your questions about priority/value of implementing this feature: it seems likely that enableMajorityReadConcern:false will be going away in 4.6 (PM-1769), but I don't think it has been firmly decided yet. milkie might have the better understanding of that decision. So, we would then only be worried about supporting rollback of these hidden index changes on 4.4 clusters that use enableMajorityReadConcern:false. All things considered, I would predict that roll back of a hidden index change operation on a EMRC=false node is relatively rare, but determining whether we need/want to support that might be more of a question for some product people, if they, for example, have a better sense of how many users will also be using the hidden indexes feature. With my current understanding, I would probably think that it is reasonable to not add support for it in rollbackViaRefetch. On the other hand, if the implementation is relatively simple (and doesn't require refetching any extra documents from the sync source, for example), then it might be worth doing. 

To also address milkie's point:

presumably because adding/removing the hidden field is going to be more common than changing an index's TTL time

If I understand correctly, a hidden index might be built but not enabled yet, so that users can optionally "turn it on" to see how it affects performance. This doesn't seem like a very frequent operation i.e. you might turn it on and test the index performance for a bit, and then possibly make a decision to turn it off or leave it on permanently. I don't really see why users would be continually turning it off and on. I suppose if they have many indexes and/or many collections they might be testing out various combinations of hidden/non-hidden indexes, which could contribute to a higher rate of hidden index modifications over a certain period of time. My intuition, however, would be that these changes would be relatively localized in time i.e. there might be some experimentation window where users are learning about the perf characteristics of certain indexes, and then end up leaving the hidden settings fixed. Of course, this intuition could be misaligned with how users will actually use the feature. mihai.andrei you might be able to make a better judgement on it than me.

Comment by Mihai Andrei [ 10/Jun/20 ]

If I understand correctly, rollback via refetch only gets used when enableMajorityReadConcern is set to false, forceRollbackViaRefetch is set to true, or the ephemeralForTest storage engine is used. Given this, the decision to add support for rolling back collMod entries which modify an index’s ‘hidden’ field (and not the TTL option, since this isn't supported as of SERVER-30999) depends on how likely it is that users can run into this issue. In particular:

  • How often are clusters deployed with majority read concern explicitly disabled?
  • Will enableMajorityReadConcern=false be removed in 4.6?

For what it’s worth, I don’t think this fix is terribly complicated, since, if I understand rollback via refetch correctly, undoing a collMod on an index's hidden field consists of inverting its current value (as well as doing any necessary work to remove redundant operations like flipping the hidden value of an index that is going to be dropped, for example). That said, I would like to get a sense of how important it is to support this before committing to implementing it. Intuitively, I would think that if we don't support rolling back a collMod for TTL options, we wouldn't do so for any index options (including hidden), but milkie makes a good point that modifying an index's TTL option would probably happen less frequently than modifying its hidden option.

CC william.schultz

Comment by Eric Milkie [ 12/May/20 ]

The problem that Mihai is describing is related to the code that was added as part of the Hidden Indexes project. We never implemented rollback for index TTL values, so the new code added by Hidden Indexes didn't get that support either. This ticket is to consider adding that support, presumably because adding/removing the hidden field is going to be more common than changing an index's TTL time, and thus the danger of hitting this fassert is greater.

Comment by Craig Homa [ 12/May/20 ]

It isn't clear to Query that this falls under their responsibilities and we are not sure why it is currently not feasible to rollback a collMod oplog entry under these circumstances. Could you shed some light on what work would be needed to permit this?

CC milkie and judah.schvimer

Generated at Thu Feb 08 05:16:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.