[SERVER-56170] Investigate why some oplog entries generated during tenant migration for timeseries bucket collections require stricter than normal idempotency guarantees Created: 19/Apr/21 Updated: 17/May/21 Resolved: 17/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dan Larkin-York | Assignee: | Dan Larkin-York |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Execution Team 2021-05-31 | ||||||||
| Participants: | |||||||||
| Description |
|
During the course of In the case of updates that happen as a result of timeseries inserts through the normal BucketCatalog machinery, we know that the resulting oplog entry which is applied on the primary should satisfy these conditions. Additionally, we know that the corresponding entry when applied on a secondary in steady state should also qualify. What we found is that tenant migrations throw some wrenches in the work here. In particular, it looks like we need to disable the optimization on the primary even when the write goes through the bucket catalog, if the write comes from a tenant migration replaying the oplog. After talking it through a bit, lingzhi.deng and dan.larkin-york came to the conclusion that the secondary should in theory be able to apply any entries generated from the primary blindly with the optimization, without checking if they resulted from a tenant migration - however, this didn't appear to be the case. Some still resulted in field duplication, and thus required the check for tenant migration source. It remains unclear why we sometimes generate these entries which require the strict idempotency guarantees which normally are not required for writes coming through the BucketCatalog. It may be that something is going wrong at the BucketCatalog layer, or it may be that tenant migrations are doing something unexpected, or any number of other things. The goal of this ticket is simply to understand what's going on here. |
| Comments |
| Comment by Geert Bosch [ 17/May/21 ] | ||||
|
So, to confirm my understanding, tenant migration depends on oplog application being idempotent, even on the primary in normal operation? If so, it seems it seems reasonable to include tenant migration among the conditions to not apply our optimization. | ||||
| Comment by Dan Larkin-York [ 17/May/21 ] | ||||
|
After digging through code and discussing the expected semantics of the v:2 doc_diff application, the current behavior seems to be expected. I'll summarize below. During a tenant migration, the recipient primary performs an initial sync procedure with the donor primary. First it gets a dump of the collection, then it catches up on any changes since the dump by replaying a portion of the donor's oplog. The tricky bit here is that the portion of the oplog that it replays may contain some operations that were already reflected in dump. That is to be expected, but it interacts in a funny way with the v:2 doc_diff format. The v:2 doc diff format has a crucial property for idempotency: that when you reapply any suffix of the diff chain in order, you'll end up at the same result. That is, if you have two diffs x and y, you can end up getting a chain like:
Now, since this oplog replay is happening on the recipient primary, any update that isn't a no-op will generate a new oplog entry. So in this chain, each application would result in a new oplog entry, even though we end up back at the same state (C) as we were at a previous step. And subseuqently, each of these oplog entries will be applied on the recipient secondary. Where this matters for the case of the optimization introduced in Importantly, we need to take note if any future projects introduce a similar mechanism to tenant migration where a primary replays a portion of an oplog that overlaps with operations it has already applied, and add exceptions to the optimization for these as well. In doc diff v:3, we should be able to introduce a new type of insert operation (insert2 or something) which does not perform reinsertion in case the field already exists. That would render such replay operations for timeseries collections no-ops, and would not generate new oplog entries. | ||||
| Comment by Dan Larkin-York [ 19/Apr/21 ] | ||||
|
Assigning to storage execution. Replication can assist if needed. |