[SERVER-27122] Restart initial sync for known index idempotency errors; fail for unknown ones. Created: 18/Nov/16 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Index Maintenance, Replication |
| Affects Version/s: | 3.4.0-rc4 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Robert Guo (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | former-quick-wins, former-robust-initial-sync, idempotency, initialSync | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||
| Description |
|
For each known index idempotency problem, return a unique error to initial sync so that it can discern them and restart initial sync without failing tests. This will permit initial sync to detect unknown idempotency problems and fail tests rather than silently hide the problems. ORIGINAL DESCRIPTION: Applying the following operations on a collection before and after dropping it will cause initial sync to restart. They were found by going through the various checks in IndexCatalog::_doesSpecConflictWithExisting 1. creating indexes with different specs but the same name. |
| Comments |
| Comment by Judah Schvimer [ 06/Nov/19 ] |
|
This should unblock SERVER-33946, and then the Replication Team can retriage that ticket to catch unknown idempotency problems. |
| Comment by Eric Milkie [ 06/Nov/19 ] |
|
After discussion, we decided to change the work for this ticket: instead of fixing the idempotency problems to work seamlessly, we can instead make each idempotency error uniquely identifiable. That way, initial sync can filter out the known problems and restart without failing the test, and any other unrecognized errors can fail the test. |
| Comment by Judah Schvimer [ 06/Nov/19 ] |
|
Without SERVER-33946 it is difficult to ensure we've handled all idempotency problems in initial sync. Can you please clarify which ones would be too difficult to fix? Maybe we can ensure the other ones don't happen in our tests instead of fixing them to still enable SERVER-33946. |
| Comment by Eric Milkie [ 06/Nov/19 ] |
|
The amount of work required to do this doesn't seem like a good tradeoff for the perceived benefit, so I am recommending we close this as Won't Fix. |
| Comment by Siyuan Zhou [ 10/Oct/19 ] |
|
I feel this needs some semantic change of index build here in those problematic idempotency issues. Assigning to Execution team. Happy to discuss proposals though. |
| Comment by Judah Schvimer [ 05/Nov/18 ] |
|
Yes, this is a result of applying operations and index creations/deletions out of order during initial sync due to the clone and then oplog-replay process. It's possible, even with UUIDs, for documents and their indexes to temporarily be incompatible, or for multiple indexes to be incompatible temporarily, such that the incompatibilities would be resolved by the end of initial sync if initial sync were allowed to finish. |
| Comment by Asya Kamsky [ 03/Nov/18 ] |
|
Is this an issue still after the UUID work for collections? |