[SERVER-40596] Test that change streams and retryable writes fail gracefully when reading oplog entries from >= 2 versions ago Created: 11/Apr/19 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Query Execution
|
| Participants: |
| Description |
|
See some discussion below. We should add some testing around a scenario where a user quickly upgrades two major versions but some of our machinery such as change streams or Retryable writes still needs to examine older oplog entries from 2 versions ago. Original Description.sharded_txn_downgrade_cluster.js expects retryable writes read oplog entries from higher FCV after downgrade. When I tried to remove the "h" field of oplog format in FCV 4.2, the test failed with the following error.
The parsing of oplog entry in TransactionHistoryIterator is done using IDL parser, which failed on the missing "h" field. This implies we cannot introduce any oplog format change that fails the old IDL parsing. It's less of a problem if we want to remove a field, because we could keep the field as optional for a release and remove it in the following release, assuming users don't upgrade / downgrade across multiple releases very soon. However, if we add new fields to the oplog entries used by retryable writes, this will be a problem. It would be better to reserve the possibility to add new fields to oplog format. There are 3 options. Change stream may have the same issue unless it parses the oplog entries manually. CC charlie.swanson. |
| Comments |
| Comment by Siyuan Zhou [ 15/May/19 ] |
|
To make it clear, by "understands oplog entries from the version before it", I assume you mean before the lower FCV. Prepared transactions doesn't have to rely on it in 4.4 and I imagine it won't. I agree with Judah, since change stream and retryable writes are allowed to fail, different from transactions. I was thinking of a test to check if the change stream reads a format it doesn't understand (from 2 versions ago, let's say), the server will return errors correctly rather than fassert. |
| Comment by Judah Schvimer [ 15/May/19 ] |
I would go farther as to say I'd love if the code never tried to understand oplog entries from 2 versions ago or a newer version, but I would settle for it behaving gracefully if it must. I would like to test that change streams and retryable writes fail gracefully when reading oplog entries from 2 versions ago or newer versions that they do not understand. |
| Comment by Charlie Swanson [ 14/May/19 ] |
|
To clarify a bit more, it is a requirement that the current version understands oplog entries from the version before it. This is required for change streams and for prepared transactions and for retryable writes, since we want all of these to work seamlessly across an upgrade and they all read "back in time" to some extent. It's my personal opinion (and I think Judah agrees) that we don't have to try very hard to make sure we understand either oplog entries from 2 versions ago or oplog entries from the newer version. It may be worth asking someone from if a quick double or triple-upgade is something we should spend time thinking about, but I don't think it is. So siyuan.zhou am I correct that this is not an urgent concern since you did not end up removing the "h" field from the oplog entry parser? If that's the case, and knowing the above, I don't see any actionable request from this ticket? judah.schvimer when you say "Testing is definitely required", what scenario are you imagining? An oplog entry from 2 versions ago being parsed? |
| Comment by Judah Schvimer [ 03/May/19 ] |
|
I think the error message now is confusing to an external user, so maybe we want to try to change it if possible, but that doesn't sound trivial. Testing is definitely required. |
| Comment by Siyuan Zhou [ 03/May/19 ] |
By requirement, do you mean it has to succeed or it can fail? If it's allowed to fail, then OplogEntry throwing an uassert exception as it does today is desired. We would just need to test that. That's my understanding of judah.schvimer's saying "fail gracefully". |
| Comment by Charlie Swanson [ 03/May/19 ] |
|
Thanks for summarizing judah.schvimer. So after a lot of discussion here, I think we've clarified that reading old oplog entries is unfortunately a requirement for the foreseeable future, correct? Given that, what is the purpose of this ticket? Should we just close it as "Won't fix"? Should we morph it into a ticket to write some testing around it? siyuan.zhou do you follow our reasoning/arguments above and agree? |
| Comment by Judah Schvimer [ 03/May/19 ] |
|
After talking with charlie.swanson, this seems somewhat unavoidable for change streams and retryable writes (or retrying transactions). We'll have to make sure those fail gracefully, but in 4.2 those should be the only things that read arbitrarily far backwards in the oplog. In 4.4, committing a prepared transaction will be able to read behind the replication commit point, so we'll have to be careful to test and handle that case when we try to change the oplog (See |
| Comment by Judah Schvimer [ 01/May/19 ] |
I don't love the assumption around speed of upgrade/downgrade here and that we crash if users break that assumption. My understanding is that with this behavior we can read oplog entries across multiple FCV changes, which means that we can read back to arbitrarily old formats of oplog entries. This ties our hands in being able to change the oplog format. If we only ever parse oplog entries in the current and (current-1) FCV, then we can safely change the oplog format across 2 releases using the standard FCV gating process. |
| Comment by Charlie Swanson [ 01/May/19 ] |
|
Sorry I think I'm missing something. Why are we talking about multiple versions here? The description seems to only indicate 4.0 and 4.2? |
| Comment by Judah Schvimer [ 29/Apr/19 ] |
I agree, but it should only be necessary across one version. We should be able to make oplog entry format changes across two versions safely. |
| Comment by Charlie Swanson [ 29/Apr/19 ] |
|
I don't fully understand how frequent or racy this behavior is. Is this something that will happen quite often if using the combination of retryable writes and change streams and downgrading? or something that could only happen if there's a lag in your change stream and so it resumes using a relatively old resume token? If it's the latter, this is already a concern we've dealt with. For example, if you do a multi-document transaction in 4.0 and then downgrade to 3.6, the change stream will not understand it. I think the same applies for prepared transactions from 4.2 to 4.0. I couldn't find anything in our documentation, but it seems like a good practice would be to somehow ensure your change streams are caught up before doing the downgrade, or expect your change streams to need to be restarted. If this seems common, I think we should probably pursue a fix in some 4.0 version to allow this possibility and take an appropriate action - seems like it's just allowing for 'h' to be absent here?
To Judah's point:
This would certainly be an easier world to think about, but I think we at least need to allow for newer versions to parse and understand old oplog entries (the 'upgrade' half of this). I think we'd already have to do that though for the scenario when we're in the new binary version but the old FCV? |
| Comment by Craig Homa [ 22/Apr/19 ] |
|
Hey charlie.swanson, can you please comment on the priority of this? |
| Comment by Judah Schvimer [ 15/Apr/19 ] |
I don't think this is an assumption we want to make, though if the harm is crashing, maybe we're ok with that behavior. I think it would be preferable to make sure we never read oplog entries across an fCV upgrade/downgrade like we've been assuming. |